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Abstract 

We prove a structural result for degree-d polynomials. In particular, we show that any degree- 
d polynomial, p can be approximated by another polynomial, pq, which can be decomposed as 
, some function of polynomials , . . . , Qj^i with qi normalized and m — Od(l), so that if X is 

' a Gaussian random variable, the probability distribution on {qi{X), . . . ,q,„(X)) does not have 

too much mass in any small box. 

Using this result, we prove improved versions of a number of results about polynomial thresh- 
old functions, including producing better pseudorandom generators, obtaining a better invari- 
^ ' ance principle, and proving improved bounds on noise sensitivity. 

m 

■ 1 Introduction 
p ■ 

t:;^ ' 1.1 Polynomial Threshold Functions 

O ■ 

■ A polynomial threshold function (PTF) is a function of the form f{X) = sgn(p(X)) for some 
polynomial p{X). We say that / is a degree-d polynomial threshold function of p is of degree at 
most d. Polynomial threshold functions are a fundamental class of functions with applications to 

^ ' many fields such as circuit complexity [Ij , communication complexity [20J and learning theory |14j . 

^ . We present a new structural result for degree-d polynomials that allows us to obtain improved 

versions of a number of results relating to polynomial threshold functions. Our result allows us 
to define a new notion of regularity for polynomials for which we can prove an improved version 
of the Invariance Principle of [T7]. We also obtain a regularity lemma (along the lines of the 
main theorem of ^) for this new notion of regularity. Although neither of these theorems will 
be directly comparable to their classical versions (due to the different notions of regularity), the 
combination of our regularity lemma and invariance principle produces a marked improvement over 
previous work. These results in turn allow us to prove better bounds on the noise sensitivity of 
polynomial threshold functions (improving on the bounds of [4J for fixed d > 3) and provide us 
with an improved analysis of the pseudorandom generators of |16] and . 

1.2 Anticoncentration and Diffuse Decompositions 

Many of the analytic techniques for dealing with polynomial threshold functions (most notably the 
replacement method (see [71 [15])) work well for dealing with smooth functions of polynomials. In 
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order to get these techniques to yield useful results for threshold functions, it is often necessary to 
approximate the threshold function by a smooth one. In order to obtain one needs to know that 
with high probability that the value of p{X) does not lie too close to zero. Results of this form have 
become known as a anticoncentration results. Such a result was proved by Carbery and Wright in 
[3] . They prove that for p a degree-d polynomial and X a random Gaussian that 

FT{\p{X)\<e\p\2) = 0{de'/^). (1) 

This bound has proved to be an essential component of many theorems about polynomial 
threshold functions. Unfortunately, presence of the e^/*^ term above often leads to results that have 
poor e-dependence for moderately large values of d, and the lack of a stronger form of Equation ([T]) 
has proved to be a bottleneck for a number of results on polynomial threshold functions. One might 
hope to overcome this difficulty by proving an improved version of Equation ([T|). In particular, a 
generic polynomial p can be thought of as a sum of largely uncorrelated monomials, and thus, one 
might expect that p{X) be Gaussian distributed. Thus, while Equation ([T]) tells us little more than 
the fact that the distribution of p{X) has no point masses, one might expect the stronger condition 
that p{X) has bounded probability density function to hold. Unfortunately, this is not the case in 
general. For example, if p is the d^^ power of a linear polynomial, the probability that < e 

will in fact be proportional to e^^"^. On the other hand, this counterexample is not as great an 
obstacle as it first appears to be. While, in this case, the probability distribution of p{X) does have 
poor analytic properties, this is because p can be written as a composition of a well-behaved (in 
this case linear) polynomial, and a simple, yet poorly-behaved polynomial (the d^^ power). The 
fact that the value of p is governed by the value of this linear polynomial will allow one to overcome 
the difficulties posed by poor anticoncentration in most applications. 

In fact, this principle applies more generally. In particular, as we shall show, any polynomial 
p may be approximately represented as the composition of a simple polynomial (i.e. a polynomial 
dependent on few input variables) and an analytically well-behaved polynomial (i.e. one with 
good anticoncentration properties). In order to make this claim rigorous, we provide the following 
definitions: 

Definition. Given a degree-d polynomial p : M."' — )• M, we say that a set of polynomials (h,qi, . . . , qm) 
is a decomposition of p of size m if Qi : M" — )• M, and h : — )• M are polynomials so that 

• p{x) = h{qi{x), . . .,qm{x)) 

• For every monomial cY^x"^^ appearing in h, we have that ^fli deg(gj) < d 

In other words, a decomposition of p is a way of writing p as a composition of a simple poly- 
nomial, /i, with another polynomial Q = (qi, . . . ,qm)- The second condition above tells us that if 
we expanded out the polynomial h{qi{x), . . . ,qm{x)), we would never have to write any terms of 
degree more than d. 

Definition. We say that a tuple of polynomials {qi, . . . ,qm) '■ — ^ M"^, is an (e, A^)-diffuse set if 
for every (ai, . . . , am) S R™" and Gaussian random variable X we have that 

PrxihiX) -ai\<efor all i) < e™iV, 

and E[\qi{X)\'^] < 1 for alii. 

We note that while an anticoncentration result need only tell us that the probability distribution 
of p{X) contains no point masses, an (e, A^)-diffuse set of polynomials will have the probability 
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density function of the vector {qi{X), . . . , qm{X)) average no more than N on any smah box. This 
is a much stronger notion of "analytically well-behaved" . Combining the two definitions above, we 
define the notion of a diffuse decomposition. 

Definition. Given a polynomial p we say that {h,qi, . . . , qm) is an (e, A'')-diffuse decomposition of 
p of size m if [h, qi, . . . , qm) is a decomposition of p of size m and if {qi, . . . , qm) is an (e, N) -diffuse 
set. 

It is not obvious that diffuse decompositions should exist in any useful cases. The main result 
of this paper will be to show that not only can any polynomial be approximated by a polynomial 
with a diffuse decomposition, but that the parameters of this decomposition are sufficient for use 
in a wide variety of applications. 

Theorem 1 (The Diffuse Decomposition Theorem). Let e,c and N be positive real numbers and 
d a positive integer. Let p{X) be a degree-d polynomial. Then there exists a degree-d polynomial 
Po with \p — Po\2 < Oc,d,Af(e^)|p|2 so that po has an {e, e~'^)- diffuse decomposition of size at most 
Oc,d,iv(l). 

It should be noted that if p is a polynomial with a diffuse decomposition, {h,qi, . . . , qm), then 
the distribution of will be determined in large part by the polynomial h, as the distribution for 
{qi{X), . . . , qm{X)) is controlled by the diffuse property. Thus, Theorem [1] may be thought of as a 
structural result for Gaussian chaoses. Theorem [T] may also be thought of as a continuous analogue 
of theorems of Green- Tao ([9]) and Kaufman-Lovett (p3]) which say that a polynomial over a 
finite field can be decomposed in terms of lower degree polynomials whose output distributions on 
random inputs are close to uniform. 

Remark. The bound on the size of the decomposition in Theorem [7] is effective, but may be quite 
large. Working through the details of the proof would lead to a bound of A{d + 0{l),N/c), where 
A{m,n) is the Ackermann function. The author believes that a polynomial in (dN/c) should be 
sufficient, but does not know of a proof for this improved bound. 

1.3 Applications of the Main Theorem 

Theorem [T] has several applications that we will discuss. The existence of diffuse decompositions 
allows us to make better use of the replacement method and achieve a tighter analysis of the 
pseudorandom generators for polynomial threshold functions presented in [11] and [16]. We can 
also use this theory to improve on the Invariance Principle of [17j. In particular, we come up with 
a new notion of regularity for a polynomial, so that for highly regular polynomials their evaluation 
at random Gaussian variables and at random Bernoulli variables are close in cdf distance. We then 
show that an arbitrary polynomial can be written as a decision tree of small depth almost all of 
whose leaves are either regular or have constant sign with high probability. These theorems of ours 
will produce a qualitative improvement over the analogous theorems of [17j and |i6j. Finally, we 
make use of this technology to prove new bounds on the noise sensitivity of polynomial threshold 
functions. Each of these applications will be discussed in more detail in the relevant section of this 
paper. 

1.4 Overview of the Paper 

In Section [21 we introduce a number of basic concepts that will be used throughout the paper. 
Section [3] will contain the proof of Theorem [Tj along with some associated lemmas. In Section 
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HI we discuss some basic facts about diffuse decompositions that will prove useful to us later on. 
In Section [Sj we discuss our application to pseudorandom generators for polynomial threshold 
functions of Gaussians. In Section [6l we state and prove our versions of the invariance principle 
and regularity lemma. In Section [71 we discuss our results relating to noise sensitivity problems. In 
Section [HI we discuss our results for pseudorandom generators for polynomial threshold functions 
with Bernoulli inputs. Finally in Section [Oj we provide some closing remarks. 

2 Basic Results and Notation 

2.1 Basic Notation 

We will use the notation Oa{N) to denote a quantity whose absolute value is bounded above by 
times some constant depending only on a. 

Throughout this paper, the variables G, X,Y, Z, , ,etc. will be used to denote multidi- 

mensional Gaussian random variables unless stated otherwise. The coordinates of these variables 
will be denoted using subscripts. Thus, Xj will denote the j*^ coordinate of the variable X^. 

We also recall here the definition of a polynomial threshold function 

Definition. A function f : R" — t- {±1} is a (degree-d) polynomial threshold function (or PTF) if 
it is of the form f[x) = sgn{p[x)) for some (degree-d) polynomial p. 

2.2 Basic Facts about Polynomials of Gaussians 

We recall some basic facts about polynomials of Gaussians. We begin by recalling the L*-norm of 
a function. 

Definition. For a function p : M" — t- M, we let 

\p\t = {ExMX)\']f\ 

We now recall some basic distributional results about polynomials evaluated at random Gaus- 
sians. 

Lemma 2 (Carbery and Wright). If p is a degree-d polynomial then 

Pr{\piX)\<e\p\2) = 0{de'/''), 
where the probability is over X, a standard n-dimensional Gaussian. 

We will make use of the hypercontractive inequality. The proof follows from Theorem 2 of |18j . 
Lemma 3. If p is a degree-d polynomial and t > 2, then 

\p\t < Vt^'^lpU- 

In particular this implies the following Corollary: 

Corollary 4 (Weak Anticoncentration) . Let p be a degree-d polynomial in n variables. Let X be 
a family of standard Gaussians. Then 

Pr{\p{X)\ > \p\2/2) > 9-^2. 
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Proof. This follows immediately from the PaleyZygmund inequality ([;19]) applied to p^. □ 

We also have the following concentration bound. 
Corollary 5. If p is a degree-d polynomial and N > 0, then 

Prx{\p{X)\>N\p\2) = 0{2-W'^'''). 
Proof. Apply the Markov inequality and Lemma [3] with t = □ 

2.3 Multilinear Algebra 

The conventions and results discussed in the remainder of this section will be used primarily in 
Section [3l and sparingly in the rest of the paper. 

We will later need to make some fairly complicated constructions making use of multilinear 
algebra. We take this time to review some of the basic definitions and go over some of the notation 
that we will be using. We recall that a A:-tensor is an element of a A:-fold tensor product of vector 
spaces A G Vi (8> • • • fX" . Equivalently, it may be thought of as the /c-linear form x • • • x — >■ M 
given by {vi, . . . , Vk) — (A, vi® ■ ■ ■ ® Vk) (assuming that each of the Vi come equipped with an 
inner product). If the Vi come with isomorphisms to M"% then we can associate A with the sequence 
of coordinates ^ii-^ = A{ei^,. . . , ej^^). 

We recall Einstein summation notation which says that if we are given a product of tensors 
with stated indices that it is implied that we sum over any shared indices. In particular if ^ is a 
fei -tensor and B a /i;2-tensor than the expression 

.1 R. . . . 

^^l,^2,■■■,^m,Jl,J2,■■■,Jkl-m■^1'l,^2,■■■,^^n,Jkl~m + l,Jkl-r^l. + 2,■■■Jkl + k2-2m. 

denotes the {ki + k2 — 2m)-tensor C with coordinates 

^il j2,---,ifci + fc2-2m ~ ^ ^ ■^h,i2,---,im,jl,j2,---,jki-m. ' 1*2 , • • -i^m j'fej^ _ m+1 , jfc j _m + 2 , ■ ■ ■ , jfei + fe2 -2m ' 

11,12, ■■■,im 

Note that if there are no overlapping indices that this product simply denotes the tensor product 
of A and B. If on the other hand, all indices overlap, this denotes the dot product of A and B. We 
will also sometimes group several coordinates into a single coordinate of larger dimension. We will 
try to use upper case letters for indices to indicate that this is happening. 

We define the norm of a tensor A to be the square root of the sum of the squares of its 
coordinates. If A is a A:-tensor we have the equivalent definitions: 

\A\l = {A,A) 

|2 

= ^xi,...,xk[\Aii,...,ikXhXi2 ■ ■ ■ ^ik\'^]- 
For tensor-valued functions A{X) we define the L^-norm by 

\A\l:=Ex[\A{X)\l]. 

We will also need the notion of a wedge product of tensors over some subset of their coordinates. 
In particular, if A is a rank-(A; + m) tensor with its first k indices corresponding to spaces of the 
same dimension, we define 



1) v:»o-{fc) Jlvj™ ■ 

ii,...,ik o-GSfe 
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Note the important special case here where ^ is a tensor product of k different 1-tensors = 
Aj • • • Af . It is then the case that 



y«i,...,ife J 

We will think of the derivative operator as taking functions on R" whose values are /c-tensors 
to functions on M*^ whose values are {k + l)-tensors. In particular, given a tensor valued function 
As{x), we define the tensor DiAs{x) to have (z, /S')-coordinate Note that this implies that 

for a vector X that DiXiAg is simply the standard directional derivative DxAs- 

Lastly, note that if p is a homogeneous, degree-d polynomial that it has an associated d-tensor 
A given by Ai^^^^^^i^ := Di^ ■ ■ ■ Di^p (note that this d^^ order derivative is independent of the point at 
which it is being evaluated) . Note that A is determined by the property that it is a symmetric tensor 
(it is invariant under any permutation of coordinates) so that for any vector X, A(X, X, . . . , X) = 
d\p{X). 



2.4 Strong Anticoncentration 

Strong anticoncentration was an idea first exposed by the author in [llj. It is a heuristic which 
states that a polynomial is generally not much smaller than its derivative. We will need to make 
use of a generalization of this to sets of several tensor-valued polynomials. In particular we will 
prove the following proposition: 

Proposition 6 (Strong Anticoncentration). For 1 < i < k let Ag_{x) be a degree-di, tensor- 
valued polynomial on M" (i.e. a tensor whose coefficients are degree-di polynomials on M"J. Let 
1/2 > e > 0. We have that 



k 

Pr\\[\A^,^{X)\^<e 



k 



-l\k 



In order to prove Proposition [6] we will need to following lemma: 
Lemma 7. For 1 < i < k let p^ be a degree di polynomial on M" and let 5, ej > 0. Then 

2'=+^ nil nil 



< 



Prx,Y\...,Y^ (Ip'(^)I < for all i, and \ det(Z?y,pX^))l > ^) 

where = YX{k+i)/2) ^'^ volume of the unit k-sphere. 

Proof. Define the function / : S'^ — > M'^ by letting 

/(ao, ai, . . . , ak)i := p\aoX + aiY^ + a^Y"^ + . . . + a^Y^). 

Notice that the matrix with coefficients Z)yjp*(X) is simply the Jacobian of / at the point (1, 0, 0, . . . , 0). 
Notice that if we replace the random variables X, y^, . . . , Y^ by linear combinations of each other 
by making an orthonormal change of coordinates, that they are still independent Gaussians and 
thus, the probability in question is unchanged. We claim that for any fixed values of X, that the 
probability over a random such change of variables that 

\p\X)\ < €i for all i, and | det(Dy,pXX))| > 6 
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2* n*_ d n*- e 

is at most — Such a statement would clearly imply our lemma. 

Note that making such a random change of variables is equivalent to precomposing / with a 
random element of the orthogonal group 0{k + 1). Thus, it suffices to bound 

Pixes'' (/(^) ^ -R' a'^d I det(Jac(/(a;)))| > 6) , 

where R CU.^ is given by HJ"^*' ^ of x G S*^ so that f{x) G R, and | det(Jac(/(x)))| 

5. We know by the change of variables formula for integration that 



det(Jac(/(x)))|(ix 



\r\y)\dy. 



(2) 



We note that the right hand side of Equation ([2]) is f^^lf ^{y)\dy. By Bezout's Theorem, the inte- 
grand is at most 2 HiLi except on a set of measure 0. Thus, \f~^iy)\dy < 2'^^^ Yli=i di HiLi ^i- 
On the other hand the left hand side of Equation ([2]) is at least 5Vol(r) = 6VkPic^^gk{x G T). Thus, 



2^+^ nil d^ nil 

sv,. 



□ 



Corollary 8. For polynomials : 



of degree di for 1 < i < k and for 1/2 > e > 0, 



Proof. We note that the problem in question is invariant under scalings of the p*, and therefore 
we may assume that \p^\2 = 1 for all i. We note by Lemma [2] and Corollary [5] that we may ignore 
the case where some < e*^' or where some > (as the probability that such an 

event happens for any i is at most 0{'^^die)). For each i we may partition the interval [€"^',6"^] 
into 0{di\og{e-^)) many intervals each of whose endpoints differ by at most a factor of 2. Up to 
a factor of O (log{e^ ^ ))^ Yl- di, it suffices to bound the probability that each of the lies in 

a specified such interval and that ni=i < ^ |det (^Dyip')] ■ If the upper endpoints of these 

intervals are ej, then this probability, is at most the probability that 

\p\X)\ < ei for aU i, and | det{Dyjp\X))\ > 2'=e JJe^. 

By LemmalZl the above probability is at most e2'^('^i+'^2+-+"'fc)0(v/A;)''+^ Multiplying by 0(log(e~^))'' 
yields our bound. 

Proof of Propositi 

function /^(x) = (^A^{x),Zy Note that 



Proof of Proposition O For Z a tensor of the same dimension as , let /i 



□ 

be the 



Furthermore, 



il,...,ik j=l 



yi,...,y'=,zi 



E 



Z\...,Zk 



det(I)y./^,(X) 
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Now suppose that for some choice of X that 



il,...,ik i=l 



(3) 



We have by Corollary U] that with probability at least 2'^^'^^ over the random Gaussians Y^, . . . , 
and Z^,. . . that the left hand side of Equation ^ is at least 



/2. 



By the Markov bound, we have that except for a probability of at most 2^'^^^ the right hand side 
of Equation ([3|) is at most 

,2r 



g220(fc) 



dei[DY^fUX) 



Thus, whenever Equation ^ holds, with probability at least 2^^^^ over y* and we have that 



detrDy./^,(X) 



□ 



But by Corollary [8] the probability of this happening (even for fixed Z*) is at most 

Thus, the probability of Equation ([3|) holding is at most 2*-^^'^) times as much, which is still 

^20{di+rf2+-+4)0(^)fc+ilog(,-i)fc. 

2.5 Orthogonal Polynomials 

Here we review some basic facts about orthogonal polynomials. Recall that the Hermite polynomials 
are an orthonormal basis for polynomials in one variable with respect to the Gaussian inner product. 
In particular, they are defined by the properties that 

• Hn : M — )• M is a degree-n polynomial 

Sn,m where ^ is a one-dimensional Gaussian random variable 

Furthermore, we have the relation that H'^{x) = y/nHn-i{x). We can extend this theory to 
polynomials in n variables as follows. For a = (oi, . . . ,a„) a vector of non-negative integers, we 
define the corresponding polynomial Ha{x) = JliLi Ha^ixi) on M". It is easy to check that the total 
degree of Ha is |a|i := X^iLi ^« ^"^^ t\ia.i 'E,[Ha{X)Hi,{X)\ = 5a,b- 

Given a polynomial p in n variables, we can always write p as a linear combination of Hermite 
polynomials. In fact, it is easy to check that 

p{X) = Yl <^a{p)Ha{X) 
|a|i<deg{p) 
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where 

Ca{p) =^\p{X)Ha{X)]. 

We define the k^^ harmonic component of p to be 

|a|i=fe 

We say that p is harmonic of degree k if it equals its k^^ harmonic part. 
Note that we can compute the derivative of Ha as 

DiHa{X) = ^iHa-e,{X). 

This is clearly a vector of polynomials that are harmonic of degree |a|i — 1. Furthermore, we have 
that 

¥.[{DiHa{X)){DiH,{X))] = J2nV^iHa-e,iX)Hb-e,{X)] 

i 

— ^ ^ ^/ ClibiSa—eijb—ei 
i 

= Sa,b X] 

i 

i 

= \a\iSa,b- 

Additionally, for a ^ b each of the components of DiHa is a Hermite polynomial orthogonal to the 
corresponding component of DiHi,. Iterating this, we can see that 

E[(Ai A2 • • • Di^Ha{X)){Di,Di, ■ ■ ■ Di,Hb{X))] = |a|i(|a|i -!)••• (|a|i -k + l)Sa,b. 

Hence we have 

Lemma 9. For p a polynomial of degree d, 

lAi ■ • • A.pWII <d{d-i)---{d-k + i)\p\l 

with equality if and only if p is harmonic of degree d. 
2.6 Ordinal Numbers 

A few of our proofs are going to use some basic facts about ordinal numbers that can be written 
as polynomials in to to show that certain recursive procedures terminate. If p is a polynomial with 
non- negative integer coefficients we consider the ordinal number p{u!). Recall that these numbers 
have a comparison operation given by p(Ld) > q{uj) if and only if the leading coefficient of p — q is 
positive. We will need the following lemma: 

Lemma 10. There is no infinite sequence of polynomials with non-negative integer coefficients, pi 
so that pi{oj) > P2(w) > P3(^) > — 
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Proof. We prove that all such sequences are finite by induction on deg(pi). If deg(pi) = 0, this 
sequence is just a decreasing sequence of non-negative integers and must therefore be finite. Next 
suppose for sake of contradiction that we have such an infinite decreasing sequence where deg(pi) = 
d, and that we know that no such infinite sequences exist with deg(pi) < d. Note that the 
cl)*^ coefficient of the pi is a non-increasing sequence of non-negative integers and therefore must 
eventually stabilize. Hence there is some a and N so that for all n > N, Pn{^) = ow'^ + qn{^) 
where g„ is a polynomial with non-negative coefficients of degree at most d — 1. It is clear that 
qn{i^) > (j'n+i(^) > • • ., so by the inductive hypothesis, this sequence must be finite. □ 

Lemma [10] allows us to perform transfinite induction. In particular, if we have some sequence 
of statements S{p) indexed by polynomials p in one variable with non-negative integer coefficients, 
and if furthermore we have that for any p, 

[S{q) for all q so that q{uj) < p{io)] =^ S{p) 

then S{p) will hold for all p. This is true for the following reason. Suppose for sake of contradiction 
that S{p) were false for some p = pi. This would imply by the given property that there was some 
P2 with p2{to) < for which S{p2) was false. Similarly, given any pi for which S{pi) was false we 

could find a Pi+i with pi^i(uj) < Pi{uj) for which was false. This would give us an infinite 

sequence of polynomials pi so that pi{uj) > P2{^) > • • •, which would contradict Lemma [TOl 

3 Proof of the Decomposition Theorem 

3.1 Overview of the Proof 

The proof of Theorem [T] comes in two steps. The first is Proposition [TT] (below), which states 
roughly that if p is a degree-d polynomial so that for a random Gaussian X \p'{X)\ is small with 
non-negligible probability, then p can be decomposed as a polynomial with smaller norm, plus 
a sum of products of lower degree polynomials. Given this proposition, the proof of Theorem [T] is 
relatively straightforward. We begin by writing a trivial decomposition of p as p{x) = l(l{p{x)). 
If this is a diffuse decomposition, we are done. Otherwise, by Proposition [6l there must be a 
reasonable probably that is small. Thus, Proposition [11] allows us to decompose p in terms 

of lower-degree polynomials. This gives us a new decomposition of p. If it is diffuse, we are done, 
otherwise it is not hard to show that at least one of the polynomials in this decomposition can be 
decomposed further. We show that this procedure will eventually terminate by demonstrating an 
ordinal monovariant which decreases with each step. 

In Section 13.21 we state and prove Proposition [TTl and in Section 13.31 complete the proof of 
Theorem [T] 

3.2 The Decomposition Lemma 

In this section, we will prove the following important proposition that will allow us to write a 
non-diffuse polynomial in terms of lower-degree polynomials. 

Proposition 11. Letp(X) be a degree d polynomial with \p\2 < 1 and let e,c, N > be real numbers 
so that 

Prx(|Ap(X)|2<6)>e^. 
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Then there exist polynomials ai{X),bi{X) of degree strictly less than d with \ai{X)\2\hi{X)\2 < 



where k = O^^cdi^)- Furthermore, this can he done in such a way that for each i, deg(ai)+deg(6i) = 
d. 

Remark. Unlike the constants implied in Theorem [7], the implied constants in Proposition [71] are 
primitive recursive functions of the parameters. Although we do not bound them explicitly, our 
techniques show that they are at worst an iterated exponential. 

Our proof of Proposition [11] will proceed in stages. First we will show that for such polynomials 
p, there is a reasonable probability (over X, Y^) that DiDyiDy^ ■ ■ ■ Dyd-ip(X) will be small. This is 
easily seen to reduce to a statement about the rank-d tensor, Ai^...i^ = Di^ • • • Di^p. In particular, 
we know that Ai^...i^Y^_^ ' ' ' ^ reasonable probability of being small. We then prove a 

structure theorem telling us that such tensors can be approximated as a sum of tensor products 
of lower-rank tensors. This in turn will translate into our being able to approximate the degree-d 
part of p by a sum of products of lower degree polynomials. 

We begin with the following proposition: 

Proposition 12. Let c,N>0 be real numbers and d a positive integer. Let e > be a real number 
that is sufficiently small given c, d and N. Suppose that p is a degree-d polynomial so that 



We begin with the following Lemma: 

Lemma 13. Let N > be a real number and let d and k be positive integers. Suppose that Ai{X) 
is a degree-d, tensor-valued polynomial so that for some 1/2 > e > 0, 



ON,c,d{^ '^)\p^'^\2 and so that 




Prx{\Dip{X)\2 < e) > e 



Then we have that 



Prx,Y{\DiDYp{X)\2 < e'-') > e' 



:ON.C,dW 



Prx{\AiX)\2 < e) > e 



Then the probability over Gaussian X that \Ai{X)\2 < e and 



is at least e /2. 



Proof. Note that by decreasing A'^, we may assume that 



Vix{\Ai{X)\2<e) = e 
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Note that for any tensor Bij 

il,...,ik ueSk 

= (~-'-)'^-^«li<r-l(l) ■ ■ ■ ^^-kJa-iik) 

~ A -^^lil ' ' ' ^^kjk- 
jl,---,jk 

Thus, for fixed X, we have by Lemma [2] that with a probabiUty of at most 1/10 over we 
have that 



f\ {D,MX))...{D,^A^{X)) 



ii,...,ik 



f\ {D,MX))...{D.j^A.,^{X)) 



n,---,jk 



>n{l/kd) 



kd 



K--< A {D,,A,{X))...[D,^A,^{X)) 



Therefore, it suffices to show that with probability at least 3e /5 that |^j(X)|2 < e and 



ji,---,jk 

N 



A {D,MX))---{D,,A,^{X)) 



<Od,fc,7v(e'-^)log(e-^)^ 



1\A: 



For fixed X, by Corollary [5] we have that with probabiUty at least 9/10 that for random 
Y^,...,Y^ that \Y^Ai{X)\ < Okil)\Ai{X)\2 for ah 1 < j < k. Thus, with probability at least 
9e^/10 over X and the Y^ , we have that \Y/ Ai{X)\ < Okie) for all j. 

On the other hand. Proposition [6] implies that with probability at least 1 — e^/10 that 



<OkA^)e-''ilogie-'))'ll\Y!AiX)\. 



(4) 



£=1 



Recah that with probability at least 9e^/10 we have that \Ai{X)\2 < e and \y/ Ai{X)\ < Ok{e) 
When this holds, the right hand side of Equation @ is at most 



Hence with probability at least 4e /5, we have that |^j(X)|2 < e and 

k 



jl,--;jk 1=1 



as desired. 



□ 



Lemma [T3l tells us some very strong information about the tensor DjAi{X). In order to under- 
stand this better, we will study what it means for a 2-tensor Bij to have -^jiji " " " ^ikjk 
small. Recall that a 2-tensor can be thought of as a matrix. We will show that this condition 
implies that B^j is approximately a matrix of rank at most k. 
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Lemma 14. Suppose that Bij is a tensor and suppose that for some integer k and some e > that 



Ik, 3k 



ti,...,ik 



< e 



Then there exist some vectors V/,Wj so that 



k-l 



< Okie). 



Proof. We proceed by induction on k. If A; = 1, we have by assumption that \Bij\2 < e, so we are 
done. 

For larger values of k, we may assume that 



«lv,«fe-l 



> e 



k-l 



or otherwise we would be done by the inductive hypothesis. 
Consider random Gaussians X^, . . . , X'^. We have that 



E 



ii,...,tk-i 



A ^hji ' ' ' Bik-i,jk-i 



ii,...,tk-i 



> e 



2k-2 



Similarly, 



E 



A ^h,ji---^ik,jk^ji---^. 



A ' ' ' 



< e 



2k 



By Lemma m we have that with probability at least 1/2 that 



A ^'iJi " " " Bik-iJk-i^ji • • • ^jk-\ 



«i,--.,«fc-i 



k-l^ 



Furthermore, by the Markov bound, we can find such X^, . . . , X^ ^ so that 



E 



A ^h,h"'Bik ,jk ^ji'-- ^jk 



ti,...,ik 



< 2e^\ 



Let V[ be the vector BijX-. We have that 



A y.i . . . y.^-i 



2k-2\ 
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and 



< 2e 



2k 



Notice that the wedge products above are simply standard wedges of vectors. Note that if we 
have vectors v} , . . . that 



v} ^v!' = v} ^ v!"'^ a v!"'- 



where v!^'^ is the projection of vf^ onto the space perpendicular to {u^,u^, 
is easy to see that we have 



k-i\ 



. From here, it 



A ti^ A • • • A n 



u 



Therefore, we have that 

On the other hand, we have that 



E 



X 



.llVt'^\l]=0,ie'). 



where is the tensor obtained from B by replacing each row BijCj with its projection onto 
{V^, V"^, . . . , y'^"^)-'". In particular, this means that each row of B-^ can be written as the corre- 

This means that for some appropriate 



sponding row of B plus an element of {V , V'', . . . ,V' 



fc-l\ 



vectors U , we have that B:[j = Bij — ^i^i VfWj. On the other hand, we note that 



\B^q = E[\Bi-Xjq] 



Ofe(e2). 



Thus, |-B"'"|2 = Ofc(e), completing our proof. 

We are now prepared to prove Proposition [T2j 



□ 



Proof. Suppose we are given c, d, N and e > sufficiently small. Suppose that we have a degree-d 
polynomial P so that 

Prx(|Ap(X)|2 < e) > e^. 

Note that by Lemma [2] that this implies that Ex[\Dip{X)\l] < Od{e''^'^'^). And hence that 
ExmDjp{X)\l]<0,{e~^''^). 

Let k be an integer so that k > 2N/c. By Lemma [T3] applied to Dip(X) we have that with 
probability at least that 



ii,...,ik i=i 



^ n /'£*:(l-c/2)^ 



Let Bij{X) be the tensor DiDjp{X). By the above and Corollary [5] we have that with probability 
at least over X that \B{X)\2 < Odie'^'^'^) and 



A II-^^^^ 

ii,...,ik ^=1 
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Applying Lemma [T^ to B at such values of X, we have that there are vectors V''', so that 



fc-i 



B 



1=1 



We note that we can replace the in such a decomposition with an orthonormal basis for the 
space that they span by adjusting the accordingly. We then have that 



2 

il2 



k-l 



1=1 2 

< {\B\2 + Oc4,N{^)f 

< Oc,d,iv(e-''^^). 

Therefore |W/|2 < 0c,d,jv(e~^°'^) for each £. 

Now given a random Gaussian vector Y, there is a probability of at least Qkie^'^''^^'') that 
ll^V^^I < g2(iAf+i £qj. ga,ch £. Furthermore, by Lemma[5l for e sufficiently small the probability that 



k-l 



< e 



l-3c/4 



is much less than this. Hence for such X (which occur with probability at least e^/2), there is a 
probability of at least e'^c,d,]v(i) Qygj. y ^^j^^t 



k-l 



{B,,-EvfW^)Y, 



< 



gl-3c/4 



and 



\YiVl\ < e 



2dN+l 



for each i. The latter implies that \YiVfWj\2 < e for each i, and thus, 



BijYi\2 < 



k-l 



{B,, - E VfW^)Y, 



i=i 



k-l 



+ Y,\Y,VfW^\2<e'- 



e=i 



Thus, with probability at least e'^c,d,jv(i)^ 

\DiDYp{X)\2 < e^-'. 



□ 



Iterating Proposition [12] will tell us that a polynomial with a reasonable chance of having a 
small derivative will also have partial higher order derivatives that are small. Considering the 
^th Qj-f^gp derivatives, this reduces to a statement about the rank-d tensor corresponding to our 
polynomial. We would like to claim that such tensors can be approximately decomposed as a sum 
of products of lower rank tensors. In order to conveniently talk about such products we introduce 
some notation. If 5* = {ai, . . . ,0^} is a set of natural numbers, we let Uig denote a tensor on the 
indices z^i j ia2 j ■ ■ ■ i • 
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Proposition 15. Let d be an integer, and let c,N,e > be real numbers. Then for all rank-d 
tensors A with \A\2 < 1 and 



Prx^,...,X'^-^i\An,...,i,K---^; 



d-l 

id-i 



2 < e) > e 



N 



Then there exist tensors U^,V^ , 1 < £ < k = Oc,d,Af(l) o,nd sets 



9 ^ Sii) C {1,2, ... ,d}, S{i) = {1,2, ... ,d} 
such that \U^\2\V^\2 < Oc,d,N{\A\2e~'^) for all £ and 

k 

5(f) 'Spy 



Si£) 



l-c\ 



Proof. We note that it suffices to prove this result for \A\2 = 1, since the general case would follow 
from applying this specialized result to A/|74|2. 

We will instead prove the stronger claim that given c, d, N, e that there exists a probability 
distribution over sequences of tensor- valued polynomials U^,V^ of degree Oc, d,Af(l) in the coefficients 
of A, so that for any tensor A satisfying the hypothesis of the proposition that with probability at 
least e*^c,d,jv(i) over our choice of U^, in this family that 



|C/^(A)|2,|y^(^)|2<0M,7v(O 



for all £, and 



^i\...id 



£=1 



' iA)Vf (A) 



Or 



Given this statement, our proposition can be recovered by picking an appropriate set of and 
for our A. We assume throughout this proof that e is at most a sufficiently small function of c, d 
and A'^, since otherwise there would be nothing to prove. 

We prove this statement by induction on d. For d = 1, we already have that |^ii|2 < e, and 
there is nothing to prove. Hence we assume that our statement holds for rank-((i — 1) tensors. The 
basic idea of our proof will be as follows. By assumption with reasonable probability over X, AX 
will satisfy the inductive hypothesis for a isx\k-{d — 1) tensor. This means that we can write 
and as polynomials in X so that with reasonable probability over X, \AX — ^ U^{X)V^{X)\2 
is small. Applying Lemmas [13] and [Ml we can show that the derivative of this tensor with respect 
to X is approximately low-rank. This means that the tensor 



A - Y,{D,,U'{X))V\X) + U'{X){D,y{X)) 



is approximated by a small sum of products of rank-1 tensors with rank-((i— 1) tensors. By making 
some random guesses, these remaining tensors can be written as polynomials in the coefficients of 
A with reasonable probability. 

Suppose that A is a rank-d tensor satisfying the hypothesis of our proposition. Then with 
probability at least over a choice of X^ , there is a probability of at least e'^ over our choice of 
that 



c/20 



Furthermore, by Corollary [Sj with probability at least 1 — we have that \Ai^^,,,^i^Xl^\2 < e' 
Hence with probability at least over our choice of X^, e'^/'^^ Ai^^^^^^i^X}^ satisfies the hypotheses 
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of our proposition as a rank-((i — 1) tensor. For each such X^, the induction hypothesis imphes 
that there is a probabihty of e*^c,d,jv(i) Qygj- q^j- choice of U^,V^ that the appropriate conclusion 
holds. Therefore, there must be some particular choice of U^,V^ so that with probability at least 



gOc,d,jv{i) over our choice of we have that 



\U\e^/''AXl)\2,\V'{e<^/''AXl)\, < 0,,aA^'^"') 



and 



=1 



S(t) 



Or 



lie 



l-c/20 



)• 



Letting U'\X^) := e-^/^^U^{e^/^^AX^) and y'^(Xi) := e-''/^°V^{e''/^^AX^), we can rephrase the 



last two equations as 



and 



\U'\x')\2,\V'\x%<Or,d,N(.e-'/'') 



AXI-YU'^. (X^)V'^, (X^ 



We will demonstrate that given a correct choice of such U' and V' we can construct new polyno- 
mials U^{A), V^{A) that satisfy the necessary conditions with probability at least e*^':,d,jv(i)^ 

Let Ti{X^) be the tensor- valued polynomial whose coefficients are the concatenation of the 
coefficients of 

k 

Tit 



and the coefficients of the eU' {X^) and eV [X"^). We have that for some A'"i = Oc,(i,Af(l) that with 
probability at least e^^ that \Ti{X^)\2 < Oc,d,iv(e^"''/^°)- We apply Lemma [H with k' > lONi/c 
and then Lemma [HI (as in the proof of Proposition I12p to show that there exist tensors W^, so 
that 

fe'-i 



< Oc,d,N{e 



l-3c/20 



We alter and to maintain the same sum Yl£=i^ ^i^j- This sum can be thought of as a rank 
k' — 1 matrix. Note that by the theory of singular values it can always be expressed in the form 
Yli=\^ C^Wf Zj where are positive real numbers and {W^}, {Z^} are orthonormal sets. Note 
that the \Ci\ are no more than \DjTi{X^)\2. The expectation of this (over X^) is bounded in terms 
of the sizes of the U'^ and V'^. These in turn can be no larger than e~'^d,c,jv(i) |-,y Lemma [2] since 
they are small with reasonable probability. Thus, by Corollary O with probability at least 6*-'c.d,Jv(i) 
over our choice of X^, we have 



DjTiiX^) - &Wlz] 



£=1 



< Oe,.,^(6l-3^/20) 



with {W^^}, {Z^} are orthonormal sets, and < e~^'^ for all £ and some A^2 = Oc,d,7v(l)- Further- 
more, we may assume that each Ci is at least e, since otherwise we could remove the corresponding 
term in Yle=i^ C^WfZj without affecting the required properties. 
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Let 

k'-i 
e=i 

Note that if SijWf is non-zero for some i, we can add SijWf to obtaining a new decomposition 
of the form specified above with \Sij\2 smaller than it was before. By taking \Sij\ minimal, we can 
assume that SijW^ = and similarly that SijZ^- = for all I. 

Let M be a sufficiently large constant depending only on c,d,N. Let C'^ be random real 
numbers in the range [e, e"^^] and let be random vectors for 1 < £ < k'. We have that with 
probability at least eOc.d.Ar(i) ^hat for all a, 6 that lY-^Zf - < O(e^), |5ijF/|2 < C>c,d,jv(e^"'=/^), 
|^Kjj|2 < Oc,d,Ar(e^'^/^°), and \C'^ — C^\ < . We construct polynomials U and V (depending on 

and C^) that have the desired properties when these inequalities hold. 

We think of A as a linear function that takes a vector Xi^ and returns a tensor on the remaining 
d — 1 coordinates. We let Y^^ be the vector 

fe'-i 

y,, =X,,-J2 Y^{DxTj{X^)){DyeTj{X'))/{C''f. 

e=i 

Note that 

fc'-i 

DxTi{X^) = SijXj + ^ C^WlXjZ^j. 
e=i 

Similarly, 

fc'-i 

DyeTiiX^) = SijYl + ^ C'W^^y/zj 

1=1 

Thus, for M sufficiently large, with high probability over X we have that 

{DxT^{X^)){Dy,Tj{X^)) = iC'fXiWl + O.^dA^^-^"^')- 
Also, with high probability over X this is at most 

On the other hand, 

fe'-i 

Y,Zf = X,Zf - J2 Y^ADxTj{X')){DytTj{X'))/{c"f 

1=1 

= x,zl - ^ 0(6^-2^-2) _ ^i^c'fx^wl + o^^aA^^-^'hMiC'f 

Hence for M sufficiently large, with high probability over X we have that 
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for all I. If this holds, then 

fc'-i 

DyTiiX') = SijYj + ^ WjC%Z^ 
e=i 

k'-l 

1=1 

= SijXj + Oc,d,N{e^~^^^^)- 

With high probability over X, this is at most Oc,d,Af(e^~^'^^^)- If this is the case, it means that 
\DyU'^\2, \DyV'^\2 < Oc4,N{e-^^h and that 

Note also that 

AX,, -AY,, = Y,AYl,{DxT,{X')){Dy.T,{X^))/{C''f. 
1=1 

Thus, we may approximate AXi, by 
k 

Y{DyU%){X'))V''^{X') + U%){X'){DYV'yj{X')) 
e=i 

k'-i 

+ ^ AYl{DxT,{X')){Dy,T,{X'))/{C''f 
l=\ 

and with high probability over X the error is Oc,d,Af (e^"^'^''^)- On the other hand it should be noted 
that the above is linear in X and thus, may be thought of as a rank-d tensor, 5, applied to X at 
one coordinate. We have that with high probability over X that 

Thus, by Lemma [21 we have that 

Furthermore, the tensor B is obviously given as a sum of products of pairs of lower-rank tensors on 
appropriate subsets of the coordinates. In order to complete our proof we need to show that these 
lower-rank tensors have size at most Oc,d,Af (e""^)- We already know that y^^(X^), l]'^g^^|/^{X^) 

and are appropriately bounded. The other tensors are expressed implicitly as linear functions 
in X with tensor valued outputs. By Lemma [21 it suffices to show for these tensors that with high 
probability over X that the output is Oc,<i,7v(e"'^)- This holds for L'yf7's(^)(Xi) and L>yy'|^(Xi) 

since with high probability SijYj is small. It holds for (L'x?j (^"'^))(^y«^j(^^))/(C''^)^ by Equation 

□ 
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%)(.X^W'W){X^) + \J%{X^){DyV%{X^)) 



We are finally ready to prove Proposition 



Proof. Assume that e is sufficiently small as a function of c, d and (for otherwise there is nothing 
to prove). 

Consider such a polynomial p. We claim that for any k < d and any c' > that 



This is proved by induction on k. The A: = case is given and the inductive step follows immediately 
from Proposition [T3j Applying this statement for k = d — 1, we note that 

DiDYi ■ ■ ■ DykpiX) 



is independent of X. Let ylji,...,i^ = Z),^ • • • Di^p{X) be the symmetric d-tensor associated to p. We 
have by Lemma[9]that |j4|2 = \/d[|pt'^^|2 < V^, and thus, A/d\ satisfies the hypothesis of Proposition 
[T5l Hence we can find tensors and with the properties specified by that proposition so that 
l^^^bl^^b < Oc,rf,iv(e~'^)|p['^] I2. Since A is symmetric, we have that 



A 



cr&s^ e=i 



(3(1)) «a(Sp)) 



l-c\ 



(6) 



Note that in the above since the sum over a has already added the permutations of over its 
indices, we may replace and by their symmetrizations without affecting the above sum. Let 
be rank di and be rank d — di. Let a£{X) be the degree-d^ harmonic part of the polynomial 
X — )• U^{X, X, . . . , X). Define b£{X) similarly with respect to V^. By Lemma [9] we have that 
lo^bl^^b < c^'lf^^bl V^^b = Oc,d,Ar(e^^)|p['^l 12- Now consider the tensor given by 



Di2 ' ' ' 



Pix)-Y,MXMX) 



This is easily seen to be the tensor given in Equation ([6]), and hence has size Oc,d,Af(e^~'^)- On the 
other hand by Lemma [U this can be seen to be times the size of the degree-d harmonic part 
of the polynomial 



pix)-Y,aiixMx). 



i=i 



This completes our proof. 



□ 



3.3 Proof of the Main Theorem 

We are now prepared to prove the Diffuse Decomposition Theorem. The basic idea of the proof 
is fairly simple. We maintain decompositions of polynomials approximately equal to p. We show 
using Proposition [TT] that if this decomposition is not diffuse that we can replace it by a simpler 
one by introducing at most a small error. This new decomposition is simpler in the sense that 
an associated ordinal number is smaller, and we will use transfinite induction to prove that this 
process will eventually terminate, yielding an appropriate decomposition. 

Proof of TheoremUi We assume for convenience that A^ and are integers. Throughout we will 
assume that A^, c, d and e are fixed. 

We define a partial decomposition of our polynomial p to be a set of the following data: 
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• A positive integer m 

• A polynomial /i : M™ M 

• A sequence of polynomials (gi, . . . , q^) each on W" with \qi\2 = 1 for each i 

• A sequence of integers (ai, . . . , Om) with aj between and 4 • 3*(A^ + l)/c — 1. 

Furthermore, we require that each Qi is non-constant, and that for any monomial Yl appearing 
in h that ^ deg(gi) < d. 

We say that such a partial decomposition has complexity at most C if the following hold: 

• m < C 

• - /i(e'^i^/(2-3^)gi(x), e"2^/(2-3')g2(X), . . . , e'^™^/(2-3'")g^(X))|2 < Ce^ 

Finally, we define the weight of a partial decomposition as follows. First we define the polynomial 

m 

= ^x^"sfe)(4.3i(^ + l)/c_a.). 
1=1 

We then let the weight of the decomposition be w{uj). 
Our result will follow from the following Lemma: 

Lemma 16. Letp he a degree-d polynomial with a partial decomposition of weight w and complexity 
at most C. Then there exists a polynomial po with an {€,e~'^) -diffuse decomposition of size at most 
Oc,d,N,w,ci'^) so that \p-p0\2 < Oc,d,N,w,cie^)- 

Proof. We prove this by transfinite induction on w. In particular, we show that either {h,qi, . . . , Qm) 
provides an appropriate diffuse decomposition of a polynomial approximately equal to p or that p 
has another partial decomposition of complexity Oc,(i,Af,c,to(l) and weight strictly less than w (with 
finitely many possibilities for the new weight). The inductive hypothesis will imply that we have 
an appropriate diffuse decomposition in the latter case. 

First note that if some Oj at least 2{N + 1)3^ /c that the sum of the coefficients of qi appearing 
in /i(e"i^/(2-3i)g^(j^)^ga2c/(2.32)^^(^^^^ ^ ^ ^ga™c/(2.3-)g^(;s^)) Oc{e^). Thus, thesc terms can be 

thrown away without introducing an error of more than Oc,d{^^)- Doing so to the largest such q^ 
and shifting all of the larger indices down, perhaps changing the Oj, and modifying h appropriately 
will lead to a new partial decomposition with a new value of C dependent only on d and the old 
one, and a strictly smaller weight. Hence we assume that < 2{N + 1)3'' /c for all i. 

If deg(gi+i) > deg(gj) for some i, we may swap qi and gj+i (making a similar adjustment to h 
and setting Oi and aj+i to 0) to get a partial decomposition of complexity C and strictly smaller 
weight. Hence we may assume that deg(gi) > deg(g2) > • • • > deg(gm). 

Were it the case that for all xi, . . . , x„i that 

PT{\qi{X) -Xi\<e for all i) < e'"-^ 

then we would already have an appropriate diffuse decomposition and would be done. Hence we 
may assume that there is a set of Xi so that the above does not hold. By Proposition [H we have 
that with probability at least 1 — e'"~'^/2 that 



ll\i^{x)-x,\>nc,d^^-'^'') 



1=1 



jl,...,jm i = l 



21 



Thus, with probabiUty at least e"^ '^12 both of the above hold, which would imply that 



OcAe 



c/2^ 



Now the wedge product above is a wedge product of vectors, and hence its size is unchanged by 
making a determinant 1 change of basis to the vectors Dj.qi. Hence, letting be the projection of 
Dqi onto the orthogonal compliment of the space spanned by the Dqj for j > i we have that the size 
of the wedge product equals Hi^i l^'b- This means that for some i that \V'\2< Ocdie"/^'). Hence 
for some i, we have with probability at least ^c,d{^^) over X that |y*(X)|2 < Oc.d{^^^^^)i and that 
this is the largest i for that X for which this holds. Furthermore, by Lemma [9] and Corollary [5] we 
know that when this happens with high probability we also have that the first derivatives of all the 
qi have size Oc,d{\-Og{e~'^Y). 

When this happens is given by the derivative of qj minus an appropriate linear combination 
of the for k > j. Note that for each coefficient, the size of the coefficient times the size of 
is at most the size of the derivative of qj. Hence for k > i, the size of the coefficient is at most 
Oc,d(e"'/^'"'log'^(e"^))- From this it is easy to see that is given by a linear combination of the 
derivatives of the qj with j > i such that the i^^ coefficient is 1 and that all other coefficients have 
size at most 

m 
k=i+l 

Hence for each such X, there are constants Cj = Oc^rf(e^'^/(^'^'^^'^/(^'^'"^) (for j > i) so that the 
derivative of qi + Cjqj at X has size at most Oc^d(e'^/^'). Note that this statement still holds if 
the Cj are rounded to the nearest multiple of e. Since there are e"*^"^^^^ such possible roundings, 
there is some set of Cj so that for the polynomial q{X) = qi{X) + J2j ^j'lji-^)^ have that 
\Diq{X)\2 < Oc,d{e''/'^') with probability at least e'^c:.''^^) over X. 

We now can apply Proposition HI] to Oc,d(e'/^^'^'^"'/^^'^'"^)9(^)- Let D = deg(gi)- Let Q{X) 
be the degree-I? harmonic part of $7c'^rf(e'^/^'^'^'^~'^/^'^'^'"-')g(X). Proposition [11] tells us that there 
are polynomials Ai,B(, of degree strictly less than D with |A£|2|-B£|2 at most Ocfi4{e~^^'^'^^^)\Q\2 
for each I, and so that Q — AiB^ equals a polynomial of degree less than D plus an error of 
norm at most Oc^c,d{^'^^^'^~^^^'^^"^'')- Note that the lower degree polynomial has size at most 

IQI2 + 5^1^^5,12. 

By Corollary [5] and Holder's inequality we have that 

\A,Bi\2 < \A,U\BiU < Od{\A,\2\B,\2) = OcCddQbe-'/^'^)). 

Consider the j among those for which deg(gj) = D for which Cje~°'^^^^'^'^^^ is the largest. Q{X) 
is then some multiple of the degree- harmonic part of gje%^/(2-3^) p^^g gnialler multiples of the 
degree- Z? harmonic parts of other g^.e"''''^/^^'^''^. 

We are now ready to modify our partial decomposition to obtain one of smaller weight. First 
we take each of the qi of degree equal to D and replace qi by the sum of its harmonic degree-D 
part and the remainder, introducing each as a new qj. This increases the complexity by at most a 
factor of 2, and increases the weight by an ordinal less than co^. 
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Next we note that qje"'^^/^^'^^^ can be written as a linear combination of the other qf.e"-kc/i2-3'') 
(with coefficients less than 1) plus a sum of Ai{X)B£{X) plus a polynomial of degree less than 
D plus a degree- polynomial of size at most Oc^cdi^^""^^^^^^^"^'^^^) ■ Replacing qj by a normalized 
version of this error polynomial, and adding new q's corresponding to the normalized versions of A£ 
and Bi and the remaining part of degree less than D and modifying h appropriately, we find that we 
have a new partial decomposition of complexity Oc,c,d(l) and weight smaller by — Oc,c,d{^^~^)- 

One thing that needs to be verified about this construction is that the norm of h is not increased 
by too much. We may think of /i as a polynomial whose input variables are the gje"''^/^^'^'^. 
Replacing relevant qi by the sum of their highest harmonic component plus a lower degree part, 
merely replaces one of the inputs to h by the sum of two new input variables, and thus, does not 
increase the size of h by more than a constant factor. The other step is somewhat more complicated 
to analyze. Here we replace the single input variable g^e^-''^/^^'^^^ by a sum of the following: 

• A sum of other variables g^e'^''^/^^'^') with coefficients at most 1 

• A new variable corresponding to the error term, with coefficient Oc,c,d(l) 

• A sum of new terms corresponding to the A^B^ 

We claim that the sum of the coefficients of the new terms in h replacing the variable qjc'^^^/^'^'^^^ 
are at most Oc,c,(i(£^^^''^'^^); thus, allowing the complexity to increase by only a bounded amount. 
This is clear for the first two contributing factors above. For the latter two, note that 

IQI2 = Oc(max{Q : deg(gi) = D]) < Oc(Cje-«^-/(2-3^)). 

Thus, after rescaling Q so that the coefficient of 1, we find that IQI2 = Oc{^)- Thus, 

when making the replacement, the sum of the coefficients of the A^Bi terms and the scaled error 
term are all OcCdle""^^^^*^^)- □ 

Our theorem follows from applying Lemma [16] to the partial decomposition m = 1, h[xi) = 
\'p\2Xi, (li{X) = p{X)/\p\2 and ai = of complexity 1 and weight [6(A^ + l)/c]u}'^ . □ 

4 Basic Facts about Diffuse Decompositions 

The primary use of a diffuse decomposition will be that the existence of a diffuse decomposition will 
allow us to approximate the corresponding threshold function by a smooth function. In particular, 
we show: 

Proposition 17. Let {h,qi, . . . , q^) by an (e, N)-diffuse decomposition of a degree-d polynomial p 
for 1/2 > e > 0. There exists a function f : — )■ [—1,1] so that: 

1- f{qiix),q2{x), . . . ,qm{x)) > sgn{p{x)) pointwise. 

2. E[/(gi(X),g2W,...,g™(X))] -E[sgn(p(X))] = 0„,d(eiVlog(e-i)*"/2+i). 

3. For any k > 0, |/*-'^'*|oo = Om,ki^^^), where \ f^^^\oo denotes the largest k^^ order mixed partial 
derivative of f at any point. 

In order to prove this and for some other applications, we will also need the following statement 
about the distribution of values of {qi{X)) in a diffuse decomposition: 
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Lemma 18. Let {h,qi, . . . ,qm) be an {e,N)- diffuse decomposition of a degree-d polynomial for 
some 1/2 > e > 0. Letting x = {qi{X), q2{X), . . . , qm{X)) for X a random Gaussian we have that 
with probability 1 - Om,d(A^e log(e-^)'^™/2+i) ^/j^^ 



\h{x)\ > e\D,,h{x)\2 > e^\Di,Di,h{x)\2 >...> e'^l • • • AXa:)|2. 



(7) 



Proof. First we note that for some B = 0„(log(e-i)'^/2) tj^j^t by Corollary [5] that \qi{X)\ < B for 
all i with probability at least 1 — e. Hence it suffices to bound the probability that Equation ([7| 
fails while < B for all i. We let R C be the region for which this fails. We bound 

the probability that x G R hy covering R by axis aligned boxes of side length 2e and using the 
fact that {qi, - ■ ■ ,qm) is a diffuse set. In particular, consider the union of all axis aligned boxes 
of side length 2e whose endpoints are integer multiples of 2e and which contain some point of R. 
Call the union of all such boxes R' . Note that since R' is a disjoint union of boxes so that for 
each such box the probability that x lies in this box is at most times its volume, we have that 
Pr(x £ R) < Pr(x G R') < NYol{R'). Let R" be the set of points y G i?™ so that y is within 
of some point in R. Note that R" D R' . Thus, it suffices to prove that 



Vol(i^") = 0„^,d(elog(e 



'l\dm/2+l 



)• 



Let Y be an m-dimensional Gaussian. Note that R" is contained in a ball of radius OmiB). 
Hence since the probability density function of BY is at least QmiB^'^dV) on this region, we have 
that Vol(i?") = Om{B"''Pr{BY G R")). Define the polynomial H{x) = h{Bx). It now suffices to 
show that with probability at most Orf^m(e log(e~^)) that Y is within Om{f) of a point, x for which 



\H{x)\ > e\D,,Hix)\2 > e^\Di,Di,H{x)\2 > . . . > e"^] Ai 



■D^,H{X)\2 



fails to hold. 

Note that by Proposition [H] for A: = 1 we have that for any 1/2 > 5 > that with probability 

i-Orf,^((5iog(ri)), 

\H{Y)\ > d\D,,H{Y)\2 > 6^\D,,D,,H{Y)\2 >...> 6''\Di, ■ ■ ■ D,^H{Y)\2. 
If the above holds and x = Y + z for \z\2 = Om{^) we have by Taylor's Theorem that 



d-k 



t=l 



• • • A,^(x) = A, • • • D,^H{Y) + 
Hence we have that 

lAi 



{D,,---D,^^^HiY))z 



t\ 



D,^H{x)-Di,...D,^H{Y)\, 
(A,---A,+,ff(l^))^.,+, 



■If, 

d-k 

t=l 
d-k 

t=l 
d-k 

t=l 

< \D. 



\{D^,---D,,^,H{Y))\2\z\i 
t\ 

\D,,---D,,H{YM\z\2/6Y 
tl 

Di^H{Y)\2{exp{Om{\z\2/d)) - I). 
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Thus, if 5 = 4:^/me and the above holds (which it does with probabihty 1 — Od,m(e log(e ^))), 
then for any point x within 2^/me of Y we have that 

I Ai • • • - • • • Di^H{Y)\^ < I • • • Di^H{Y)\2{e^/^ - 1), 

and thus, Equation ^ holds. Thus, Pr(5y G R") < Orf,m(e log(e-i)), so Vol(i?") = Od,m(elog(e-i)'^™/2 
completing our proof. □ 

Corollary 19. Let {h,qi, . . . ,qm) be an (e, N)-diffuse decomposition of a degree-d polynomial for 
1/2 > e > 0. Letting x = {qi{X), q2{X), . . . , qm{X)) for X a random Gaussian, the probability that 
X is within e of a point y for which h{y) = is Od^mi^Xelog^e^^)'^™'^'^^^) . 

Proof. Note that by the analysis given above, if Equation d?]) holds for 2e then for any y with 

l^; — y| < e 

\h{x)-h{y)\<\h{xW/^-l)<\h{x)l 

and thus, h{y) ^ 0. Since an (e, A^)-diffuse decomposition is also an (2e, 2™A^)-diffuse decomposi- 
tion, this happens with probability at least 1 — Orf^m(-^elog(e~^)'^™^^'^^) by Lemma [TSl □ 

Proof of Proposition \17\ We construct / in a straightforward manner. Let p : — t- R be any 
smooth, non-negative-valued, function supported on the ball of radius 1 so that 

p{x)dx = 1. 
Let Pe{x) = e~'"p(e^^x). We note that 

/ p^{x)dx = 1. 

Let 5 : M be the function 



1 if there exists a y G M" so that |x — y| < e and h(y) > 
— 1 otherwise 



We let / be the convolution g * Pe- 
f takes values in [—1, 1] because 



/(x) = / Peiy)gix - y)dy < / Peiy)dy = 1 

and similarly f{x) > — 1. 

/(gi, . . . , qm) is a point-wise upper bound for sgnop = sgn{h{qi, . . . , qm)) since if h{x) > then 



f{x)= pe{y)g{x-y)dy I p^{y)g{x - y)dy = I pe{y)dy = l. 

JR-^ JB{e) J B{e) 

Bounds on the derivatives of / come from the fact that 

l/^'^loo = \9 * pi'^^lo. < IgUpi'^h = 0™,fc(6-'). 

The second property follows from the fact that f{x) = sgn(/i(x)) unless x is within 2e of a point y 
for which h{y) = 0. Noting that an (e, A^)-diffuse decomposition of size m, is also a (2e, 2™'A^)-diffuse 
decomposition, this happens with probability Orf^m(-^e log(e~^)'^™'/^+-^). Since \ f{x) — sgn{h{x))\ is 
never more than 2 this provides the necessary bound. □ 
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Another lemma that wih be useful to us is the following: 



Lemma 20. Let {h,qi,. . . ,qm) be an (e, N)-dijJuse decomposition of a degree-d polynomial p for 
1/2 > e > and A^e log(e^-'^) less than a sufficiently small function of m and d. Then \h\2 < 
0^,d{N%\2). 

Proof. Consider the probability that |p(A)| > 2\p\2- On the one hand, it is at most 1/4 by the 
Markov bound. We will show that if \h\2 is more than a sufficiently large constant times N'^\p\2, 
then the probability must be more than this. 

We note by Corollary [5] that with probability at least 7/8 that each qi{X) is Orf(log(m)'^/^). 
We consider the probability that each qi{X) is at most this size and that |p(A)| < 2\p\2- We 
bound this probability above by coving the set of x € M"^ with each \xi\ = Orf(log(m)'^/^) so that 
< 2\p\2 with boxes of side length e. The probability is at most N times the volume of the 
union of these boxes. Furthermore, the union of these boxes is contained in the set of x G with 
l^^il < Od(log(m)'^/^) for each i and so that x is within em of some point y with \h{y)\ < 2\p\2. Call 
this region R. We note that since R is contained in a ball of radius 0^,(1) that the volume of R is 
bounded by some constant times the probability that a random Gaussian X lies in R. 

By Proposition m we have that with probability 1 — Orf^m(e log(e~^)) over Gaussian X that 

\h{X)\ > A€m\Di,h{X)\2 >...> (4em)'^| Ai • • • AXA)|2. 

This would imply that for any y within me of X that \h{X) — h{y)\ < \h(X)\/2 by means of the Tay- 
lor series for h{y). On the other hand \h{X)\ > A\p\2 with probability at least 1 - Od{{\p\2/\h\2y^'^) 
by Lemma [21 Thus, the probability that |/i(A)| < 2\p\2 is at most 

0,,„(6 log(l + + (Ipb/l/^b)'/") + 1/8. 

Hence we have that 

1/4 < Pr(|p(X)| > 2|p|2) < Orf,™(iVelog(l + e-^) + iV(|p|2/|/i|2)^/'') + 1/8. 
Thus, if A^elog(e~^) is less than some sufficiently small function of d,m, then \h\2 = Od,m{X'^). 

□ 

Fundamentally, having a diffuse decomposition is useful because it allows us to improve our 
application of the replacement method. The following proposition presents this technique in fair 
generality. 

Proposition 21. Let po : — t- M 6e a degree-d polynomial with an (e, N)-diffuse decomposition 
{h,qi, . . . , qm) for some 1/2 > e > 0. Let Ui he positive integers so that n = X]i=i "-i- can then 
consider p^ and each of the qi as functions on W^^ x • • • x M"-*^ . 

Let X^, . . . ,X^ and , . . . be independent random variables, where X^ and take values 
in M^J and Y^ is a random Gaussian. Furthermore, assume that for some integer k > 1 that for 
any polynomial g in m variables of degree less than k, any 1 < j < i and any that 

E[g{q,{z\...,z^~\X\z^+\. ..,/))] = E[g{q,{z\ . . . , z^-\Y\ z^-^\ . . . , /)) 

For each 1 < i < m and each 1 < j < i define 

Qi,j (-^ ) • • • ) 

,x^^ , . . . ,x ) ■.= EYi[qi{x , . . . ,Y^ ,x^~^ , . . . ,x )]. 
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Define Ti j to he 



E 
+E 



1 vi+i 



,iY\...,Y^-\X^,...,X')-Q,,,iY\...,Y^-\X^+\...,X' 



And let 

m I 

i=i j=i 

Then we have that 

Pr(po{X\...,X')<o)-Pr(po{Y\...,Y')<o) < Od,m,k (e-^T + eNlog{e-y'^/^+' 



If furthermore, p is a degree-d polynomial so that for some parameters 6,r] > 
Pr{\p{X)-po{X)\<6\p\2),Pr{\p{Y)-po{Y)\<6\p\2) > 1 - ry 



then 



Pr(p{X\...,X')<0) -Pr(p{Y\...,Y')<0) < Od,m,k (e-^T + eN log{e-y^/^+' + 5^/^ + 



When considering Proposition 1211 it might be useful to keep the intended applications in mind. 
In Section [5l we will consider the case where the X^ are chosen from d/c-independent famihes of 
Gaussians. In Section [H we will consider the case where the X^ are Bernoulli random variables. 
Finally, in Section [HI we will consider the case where the X^ are chosen from 4(i- independent families 
of random Bernoullis. 

Proof. We begin by proving the first of the two bounds, and will then use it to prove the second. 
By rescaling we may assume that \pq\2 = 1- Let X = (X^ ^ . . . , X^) and Y = {Y^ ^ . . . , Y^). Let 
q denote the vector valued polynomial (gi, . . . , qm)- We will show that 



Pr (po(^) < 0) < Pr (po(y) < 0) + 0^,^,^ [e^'^T + eN\og{t~^ 

— > [0, 1] so that 



The other inequality will follow analogously. 
By Proposition [17] there exists a function / : 



1. f{x) = 1 for all X where h{x) < 0. 

2. E[f{q{Y))] = Pr(po(y) < 0) + Orf,™(eiV log(e-i)'^-/2+i). 

3. |/('=)|oo = 0„^,fc(e-'=). 
We note that 



Fi{po{X) < 0) < E[f{qi{X))] 



and that 



E[/(g(y))] < Pr(po(y) < 0) + Orf,™(eiV log(e-i)'^-/2+i). 
Hence it suffices to prove that 

\E[f{q{X))]-E[f{q{Y))]\ = Od,mA^-''T). 



(8) 
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For 0<j<£, let 

Z(J') := {Y\...,Y^,X^+\...,X^). 

In particular, Z^^'^ = X, Z^^^ = Y , and Z^^^ is obtained from Z^^~^^ by changing the j**^ block 
of coordinates from X^ to Y^ . We will attempt to bound the left hand side of Equation ^ by 
bounding 

E[/((?(z(^)))]-E[/(g(z(^-i)))]| (9) 

for each j. 

Consider the expression in Equation ([9]) for fixed values of Y^, . . . ,Y^^^,X^^^, . . . ,X^. We 
approximate f{q{Z^^^)) and f{q{Z^^^^^)) by Taylor expanding / about (gT, . . . where 

UY\...,Y^-\Z,X^+\...,X') ■.= Q,,,iY\...,Y^~\X^+\...,X') 

= Efe(Z(^-^))]. 
Thus, for some polynomial g of degree k — 1 we have that: 

f{q{Z))=g{q{Z))+ol J] ^ U^Q^AZ) -WliZ)) 

= g{q{Z)) + 0,,^,J J2 9^(^)1' 

yn,...,ife a=l 

= g{q{Z)) + 0,,„,fc ^e-^' ^ - . 

By assumption 

E[g{q{Z(^^))]=E[g{q{Z(^-'^))]. 
Thus, the expression in Equation Q is at most 



\j=i / 



2 = 1 



Summing over j yields Equation ([8|), proving the first part of this proposition. 
Changing our normalization so that \p\2 = 1, we have that 

Fv{p{X) < 0) < Pr(po(^) - (5 < 0) + 0(r?) 

Notice that Po — S has the diffuse decomposition {h — 6,qi,...,qm)- Therefore, applying our previous 
result to this decomposition of po ~ we have that 

Pr(po(X) - <5 < 0) < Pr{po{Y) - 5 < 0) + Od,m,k (e^'T + eiVlog(e-i)'^™/2+i 

On the other hand we have that 

Pi{po{Y) -S<0) < Pr(p(y) -26 <0) + 0{r]). 
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Finally, by Lemma [2] we have that 

Fi{p{Y) -26<0)< Pi{piY) < 0) + 0{d6^/'^). 
Combining the above inequalities we find that 

Fvip{X) < 0) < Pt{p{Y) < 0) + Orf,™,fc (e-'=T + eN \og{e-y^/^+' + S'/^ + t? , 
The other direction of the inequality follows analogously, and this completes our proof. □ 

5 Application to PRGs for PTFs with Gaussian Inputs 

In jllji the author introduced a new pseudorandom generator for polynomial threshold functions 
of Gaussian inputs. In particular, for appropriately chosen parameters and k he lets 



N 

I 

X 



1 ^ 

'AT ^ 



N . , 
1=1 

where the are independently chosen from fc- independent families of Gaussians. He shows that 
for some k = 0{d/c) and = 2'-^'=('^)e~'^~'^ that for any such X, if y is a random Gaussian and / 
any degree-d polynomial threshold function then 

\E[f{X)]-E[f{Y)]\<e. (10) 

The proof of this is by the replacement method. In particular, / is replaced by a smooth 
approximation g, and bounds are proved on the change in the expectation of g{X) as the X^ 
are replaced by random Gaussians one at a time. The power of this method is highly dependent 
on ones ability to find a g that is close to / with high probability and yet has relatively small 
higher derivatives. If f{x) = sgn{p{x)), a naive attempt to use the replacement method would 
use g = p{p{x)) for p a smooth approximation to the sign function. Unfortunately, this approach 
will have difficulty proving Equation (jlOp unless A'" > e~^'^. In [llj, the author uses a version of 
Proposition [6] and constructs a g which approximates / as long as an appropriate analogue of 

\g{x)\ > e\Di,g{x)\2 > e^\Di,Di,g{x)\2 >... 

holds. The analysis of this is somewhat complicated, involving the development of the theory of the 
so-called "noisy derivative". Furthermore, for technical reasons this method has difficulty dealing 
with N smaller than e~^. As a first application of our theory of diffuse decompositions we provide 
a relatively simple analysis of this generator that works with A^ as small as e~^^^. In particular we 
show: 

Theorem 22. Given, an integer d > and real numbers c, e > 0, there exist integers k = 0{d/c) 
and N = Oc,d(e~^~'^) so that for any random variable 



N 

I 

X 

i=l 



1 



where the X^ are chosen independently from k-independent distributions of Gaussians, and for any 
degree-d polynomial threshold function f , 

\E[f{X)]-EY^M[fiY)]\<e. 
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Proof. We begin by making a few reductions to produce a more amenable case. We assume through- 
out that e is sufficiently small. Note that it is sufficient to prove that for A'^ = e"^""^ that the error 
is Oc,d(c^~^'^)) since making appropriate changes to c and e will yield the necessary result. Secondly, 
we may let f{x) = sgn(p(x)) for p a degree-d polynomial with \p\2 = 1. 

By Theorem[T] there exists a degree-d polynomial po with |p — pob = Oc.dle*^^^)) and so that po 
has an (e, e~'^/^)-diffuse decomposition {h,qi, . . . , qm)- It should be noted that by 2-independence, 

KMX) - po(X)p] = K[\p{Y) - poiY^] = 0,,,{e''+'). 

Therefore, by the Markov bound we have with probability at least 1 — that 

\p{X)-po{X)\,\p{Y)-po{Y)\<e''. 

We note that we may write Y = Z^jLi ; where the Y^ are independent Gaussians. We 
define the polynomial 



p'{Y\...,Y 




We note that 
and 



p{X)=p'{X\...,X 



,Y 



p{Y)=p'{Y\. 

It is clear that if we define p'q and q[ analogously, that p'^ has an (e, e'^/^)-diffuse decomposition 
(/i, g']^, . . . , q'j^), and that with probability at least 1 — that 

\p'{X)-p',{X)\,\p\Y)-p',{Y)\<e'. 

We may thus apply Proposition [2T] to p' ,p'q with t] = and 6 = e'^. 

Let K be an even integer less than k/d and more than 6/c. By ^-independence of the X^ , any 
polynomial g of degree less than K in the will have the same expectation evaluated at X^, . . . , X^ 
as at y^, ... , Y^ . Hence by Proposition I2H 



|E[/(X)] - E[f{Y)]\ = 2 \Fv{p{X\ ...,X^)<0)- Pt{p{Y\ . . . , y^) < 0)| 



e^-^ log(e-i)'^'^/2+i ^ ^-Krj. 



(11) 



Where by the -fC- independence of X, the T above is 

m N 

2Y.Y.^[{q^{Y)-EyMiY\...,Y'')]) 



K' 



By Lemma O this is 



i=i j=i 



m N 



i=i j=i 



K/2 



Letting Z = r^-^ Si^^j ^* (which is a random Gaussian), the expectations in question are 



Vary qi 



N 



N 
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This in turn is at most 




E 



We bound this with the following lemma, which follows immediately from Claim 4.1 of [5]: 
Lemma 23. For q any degree-d polynomial we have that 



E 



Thus, T is at most 



q{Z)-q 



Thus, by Equation ([TT]) . 
as desired. 



/iV-1 1 



0{d'\q\i/N). 



(m N \ 
i=l j=l J 

|E[/(X)]-E[/(y)]|<Oe,d,„^(e^-'^), 



□ 



6 The Diffuse Invariance Principle and Regularity Lemma 

While the case of Gaussian inputs is very convenient for proving theorems such as the Decomposition 
Theorem, many interesting questions involve evaluation of polynomials on random variables from 
other distributions. Perhaps the most studied of these is the Bernoulli, or hypercube distribution. 

Definition. The n-dimensional Bernoulli distribution is the probability distribution on M" where 
each coordinate is randomly and uniformly chosen from the set { — 1,1}. Equivalently, it is the 
uniform distribution on the set {—1, 1}". 

As we have been using X, Y, Z, etc. to represent Gaussian random variables, we will attempt to 
use A,B,etc. for Bernoulli random variables. 

A powerful tool for dealing with Bernoulli variables is the use of invariance principles. These are 
theorems which state that if p is a sufficiently regular polynomial (for some definition of regularity) 
that the distributions of p{X) and p{B) are similar to each other (generally that they are close in 
cdf distance). This allows one to make use of results in the Gaussian setting and apply them to 
the Bernoulli setting (at least for sufficiently regular polynomials). Since not all polynomials will 
be regular, in order to make use of this idea in a more general context, one also needs a regularity 
lemma. These are structural results that allow us to write arbitrary polynomials of Bernoulli 
random variables in terms of regular ones. 

In this section, we will discuss some of the existing invariance principles and regularity lemmas, 
and make use of the theory of diffuse decompositions to provide some new ones that will deal 
better with high degree polynomials. In Section 16. H we discuss some background information 
about polynomials of Bernoulli random variables and give a brief overview of existing invariance 
principles and regularity lemmas. In Section [6^21 we state and prove the Diffuse Invariance Principle, 
and in Section [6.31 prove the corresponding regularity lemma. 
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6.1 Basic Facts about Bernoulli Random Variables 

6.1.1 Multilinear Polynomials 

For a Bernoulli random variable B, we have that any coordinate, 6j, satisfies bf = 1 with probability 
1. This, of course, does not hold in the Gaussian case. Thus, if there is going to by any hope of 
comparing polynomials on Gaussian and Bernoulli inputs, we must restrict ourself to polynomials 
that have no term that is degree more than 1 in any variable. In particular, we must restrict 
ourselves to the case of multilinear polynomials: 

Definition. A polynomial p : M" — t- M is multilinear if its degree with respect to any coordinate 
variable is at most 1. 

To clarify the relationship between general polynomials and multilinear polynomials we mention 
the following lemma: 

Lemma 24. For every polynomial p : R" — )• M, there exists a unique multilinear polynomial 
q : M" — )• R so that q agrees with p on { — 1, 1}". Furthermore, deg{q) < deg{p). 

Proof. To prove the existence of q, it suffices to show that the result holds for every monomial 
P ~ n ^f' • clear that this monomial agrees on the hypercube with the multilinear monomial 

■Q ^"i (mod 2) 

Uniqueness will follow from the fact that any non-zero multilinear polynomial on R" is non- 
vanishing on the hypercube. This follows from the fact that the map from a multilinear polynomial 
to its vector of values on {—1, 1}" is a surjective linear map of vector spaces of dimension 2". □ 

Definition. For any polynomial p{x), let L{p{x)) be the corresponding multilinear polynomial as 
described by Lemma\2^ 

6.1.2 L'P Norms and Hypercontractivity 

As the norms for polynomials of Gaussians have been useful to us, the corresponding norms for 
the Bernoulli distribution will also be useful. 

Definition. Let p : R" — )■ R we for t > 1, we define \p\B,t os 

\p\B,t = {mpm']f' ■ 

Where above B is an n-dimensional Bernoulli random variable. 

We also have the analogue of Lemma [3l In particular, we have that: 
Lemma 25 (Bonami pj). For p : R" — t- R a degree-d polynomial, and t >2 we have that 

\p\B,t < Vt- 1'^\p\b,2- 

Yielding the Corollary 
Corollary 26. For p : R" — )• R a degree-d polynomial N > 0, then 

PrB{\p{B)\ > N\p\b,2) = O (2-W2)''"') . 

The proof is analogous to that of Corollary [5l 

We will also need a result combining Lemmas [3] and [25] 
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Lemma 27. Let p be a degree-d polynomial, B a Bernoulli random variable, G a Gaussian random 
variable, and t > 2 a real number. Then 

E[\p{G,B)\'] < {t-iy'^/^E[p{G,Bff'^ 

Proof. For integers A'', let G^ be a random variable defined by G^ = Sj^i where the are 

independent Bernoulli random variables. Clearly the coordinates of G^ are independent and by the 
central limit theorem, as — )• oo, their distributions converge to Gaussians in cdf distance. This 
implies that for and e > and for sufficiently large that we can have correlated copies of the ran- 
dom variables G and so that \G — G^\ < e with probability 1 — e. Furthermore, with probability 
1 — e, |G| = 0„(log(e~^)) (here n in the number of coordinates of G). Therefore, for sufficiently large 
N we have that with probability 1 - 0(e) that \p{G, B) -p{G^,B)\ = Op(e log(e-i)'^) (this follows 
from considering every possible value of B separately). Furthermore, note that E[|p(G, and 
E[|p(G^, are both finite. Thus, for any indicator random variable, / we have that 
E[\p{G,B)fl] < E[|p(G,i?)|2*]i/2E[/]i/2 = Op,t(E[/]i/2). 

Similarly, 

E[\p{G^,B)\'l] = Op{E[I]'/'). 

Therefore, if \p{G, B) -p{G^,B)\ = Op{e log(e-i)'^) with probability 1 -0(e), let / be the indicator 
random variable for the event that this fails. Then 

EMG,B)\']-EMG'',B)\'] 

< Ot{l)EMG,B) -p{G'',B)\\p{G,B)+p{G'',B)t^] 

< Ot(l)E[|p(G,B) -p(G^,B)|]1/2e[|p(G,5)|2*-1/2 + [^(G'iV^ ^)|2t-l/2]l/2 

= Op^l) {E[\piG,B) -p{G'',B)\{l - I)]+E[\p{G,B)-p{G^,B)\I]f' 

Therefore 

Jim E[\p{G'',Bt]=E[\p{G,B)f]. 

On the other hand, note that p{G^,B) can be thought of as a polynomial evaluated at a 
Bernoulli random variable in perhaps a greater number of dimensions. Hence by Lemma [25| 

E[|p(G^,5)|*] < {t - l)'^*/^E\p{G^ , Bff/^ . 

Thus, 

E[|p(G,i?)r]= lim E[|p(G^,i?)r] 

< lim {t-lf^/^E[p{G^,BfY/^ 

= (t- l)*'i/2E[p(G,S)2]*/2. 

□ 

We also note the following relationship between the Gaussian and Bernoulli norms 
Lemma 28. If p : — ?• M is a multilinear polynomial then \p\2 = \p\b,2- 

Proof. This follows immediately after noting that the basis Y\ for a G {0, 1}"" is an orthonormal 
basis of the set of multilinear polynomials with respect to both the Bernoulli and Gaussian measures. 

□ 
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6.1.3 Influence and Regularity 

The primary obstruction to a multilinear polynomial behaving similarly when evaluated at Bernoulli 

inputs rather than Gaussian inputs is when some single coordinate has undo effect on the output 
value of the polynomial. In such a case, the fact that this coordinate is distributed as a Bernoulli 
rather than a Gaussian may cause significant change to the resulting distribution. In order to 
quantify the extent to which this can happen we define the i^^ influence of a coordinate as follows: 



Definition. For p : 



a function we define the i*'' influence of p to be 

2 



Infiip) :-- 



dp 



dxi 



It should be noted that for multilinear polynomials p, this is equivalent to the more standard defini- 
tion 

Inf,ip)=EA[VaraMA))]- 

This is the expectation over uniform independent {—1, 1} choices for the coordinates other than the 
j^th cQQfdinate of the variance of the resulting function over a Bernoulli choice of the z*'* coordinate. 
Equivalently, it is 



\p{ai, . . . ,aj_i, -l,ai+i, . . . ,a„) - p{ai, . . . ,aj_i, l,aj+i, . . . ,a„) 
We now prove some basic facts about the influence. 



Lemma 29. If p : 

coefficients of p. 

Proof. Recall that 



Therefore, we have that 



is a polynomial Inf^{p) is X^aai|ca(p)P, where Caip) are the Hermite 



dp 



^ = XI V0dCa{p)ha-ei{3 



Thus, 



dp 



dxi 



□ 



Prom this we have 
Corollary 30. For p a degree-d polynomial in n variables, 



Y^Miip) = Y.^\P^'^\1 = Qd{Var{p{X))). 



k=l 



We now make the following definition (which agrees with the standard ones up to changing r 
by a factor of 0(^(1)): 
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Definition. Let p be a degree-d multilinear polynomial. We say that p is t -regular if for each i 

Infiip) < TVarAip). 

In terms of this notion of regularity, the standard invariance principle, proved in [17], can be 
stated as follows: 

Theorem 31 (The Invariance Principle (Mossel, ODonnell, and Oleszkiewicz)). Ifp is a r-regular, 
degree-d multilinear polynomial, A and X are Bernoulli and Gaussian random variables respectively 
and t E M, then 

\Pr{p{X) <t)- Pr{p{A) <t)\= Oidr'^l^^'^^). 

It should be noted that the dependence on t^I'^ in the error of Theorem [31] is necessary. In 
particular, if d is even and is a sufficiently large integer consider the polynomial p : — )• M 
defined by 

p{xo, Xn) = TXo + ( Xi 

Let q = L{p). It is not hard to see that by making N sufficiently large, one can make \p — q\2 
arbitrarily small, and thus, by Lemma[2]and Corollary[5l we can make the probability distributions 
for p{X) and q{X) arbitrarily close. It is also not hard to see that q is 0rf(r^) regular. This is 
because Info(^) = r^, Infj(q') = 0(i{N~^) for i 7^ 0, and Var^(g(A)) = 6^(1). On the other hand, 
it is clear that for Bernoulli input A we have that 

qiA) = piA) > -T. 

On the other hand considering the distribution of values of (which as stated can be arbitrarily 

close to that of q{X)), if we let y = Z^il:^"'^ Xi, we note that xq and y are independent Gaussians. 

Thus, with probability G(r^/'^) we have that xq < —2 and \y\ < t^^'^. If these occur, then p{X) < 
—T. Thus, for sufficiently large the difference between the probabilities that q{A) < — r and that 
q{X) < —T can be as large as ^{t'^/'^). 

The essential problem in the above example is that although the first coordinate has low infiu- 
ence, there is a reasonable probability that the size of q{X) will be comparable to r, and in the case 
when |9(X)| is small, the relative effect of the first coordinate is much larger. We get around this 
problem by introducing a new concept of regularity involving the idea of a diffuse decomposition. 
The problem above came from the fact that the probability distribution of q{X) was too clustered 
near 0. Since the analogue of this cannot happen for a diffuse set of polynomials, we expect to 
obtain better bounds. 

Definition. For p a degree-d multilinear polynomial, we say that p has a (r, A^, m, e)-regular de- 
composition if there exists a polynomial po of degree-d so that 

• 1^-1^011,2 ^ e'^Var{po{X)). 

• po has a {t^/^ , N)-diffuse decomposition of size m, {h,qi, . . . , qm) so that qi is multilinear for 
each i and Infj{qi) < t for each i,j. 

Theorem 32 (The Diffuse Invariance Principle). If p is a degree-d multilinear polynomial that 
has a {t, N,m, e) -regular decomposition for 1/2 > e,T > 0, A and X and random Bernoulli and 
Gaussian variables respectively and t is a real number, then 

\Pr{p{A) <t)- Pr{p{X) < t)\ = Od,^{T^">N\og{T-y^'/^+^ + e'/<'\og{e-'f'^). 
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Remark. We can derive a statement very similar to that of Theorem [57] from Theorem \ 3SX In 
particular, ifp is multilinear, and t -regular, we may normalize p so thatKxipiX)] = 0,Kx\p{X)'^] = 
1. Then by Lemma{^ we have that for h = Id and q = p, {h,q) is a (t1/5, 0{dTW<i-i)/5)ydiffuse 
decomposition of p. Furthermore, by assumption q is multilinear and has all influences at most r. 
Therefore, this is a {T,0{dT^^^'^"^^/^), 1,0) -regular decomposition of p. Thus, we obtain 

\Pr{p{A) <t)- Pr{p{X) < t)\ = Orf(ri/(5'^) log{T-Y/'^+^). 

Neither invariance principle on its own is very useful for dealing with general polynomial thresh- 
old functions which might not satisfy the necessary regularity conditions. Fortunately, in both cases 
if regularity fails it will be because some small number of coordinates have undo effect on the value 
of the polynomial. If this is the case, we can hope to make things better by fixing the values of 
these coordinates and considering the resulting polynomial over the remaining coordinates, hoping 
that it is regular. Theorems confirming this intuition have been known as regularity lemmas. For 
the standard notions of regularity, various regularity lemmas have appeared in [6] and [3] as well 
as other places. As an example, [6] proved the following: 

Theorem 33 (Diakonikolas, Servedio, Tan, Wan). Let f{x) = sign{p{x)) be any degree-d PTF. 
Fix any r > 0. Then f is equivalent to a decision tree T , of depth 

depth{d,T) = - ■ (dlog(r-i))^('=') 
r 

with variables at the internal nodes and a degree-d PTF fp = sign{pp) at each leaf p, with the 
following property: with probability at least 1 — t , a random path from the root reaches a leaf p 
such that fp is r -close to some T-regular degree-d PTF. 

Along similar lines, we prove the following: 

Theorem 34 (Diffuse Regularity Lemma). Let p be a degree-d polynomial with Bernoulli inputs. 
Let T,c,M > with r < 1/2. Then p can be written as a decision tree of depth at most 

Oe,.,M (r-Mog(r-i)«W) 

with variables at the internal nodes and a degree-d polynomial at each leaf, with the following prop- 
erty: with probability at least 1 — t, a random path from the root reaches a leaf p so that the corre- 
sponding polynomial Pp either satisfies Var{pp) < T^^lppl^ orpp has an {T,T~'^,Oc,d,M{^),Oc,d,M{T'^^))- 
regular decomposition. 

6.2 The Diffuse Invariance Principle 

In this section, we prove Theorem 1321 We begin with the following proposition: 

Proposition 35. Let p be a degree-d polynomial with a {t^^^ , N)-difJuse decomposition (for 1/2 > 
T > 0) {h, 

qi: ■ ■ ■ ^q-m) with qi multilinear so that Inf(qj) < r for all i,j. Then if A is a Bernoulli 
random variable, X a Gaussian random variable and t a real number then 

\Pr{p{A) <t)- Pr{p{X) < t)\ = Orf,^(Arri/5iog(r-i)^-/2+i). 

Proof. It suffices to prove this statement for t = 0. We proceed via Proposition [21] We note that 
for each i the first three moments of Ai agree with the corresponding moments of Xi. Therefore, 
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since the qi are multilinear, any degree-3 polynomial in the qi has the same expectation under A 
as under X. Thus, we may apply Proposition [2T] with A; = 4. We have that 

\Pr{p{A) < 0) - Pr(p(X) < 0)1 = OdA^r^''' log(r-i)'^™/2+i ^ ^-4/5^)_ (^2) 
Recall that Tjj- is 

E [(g,(Xi,...,Xj_i,^j-,...,A„)-Ey[g,(Xi,...,Xj_i,y,^j-+i,...,A)])- 
+E . . . , Xj, ^j+i, . . . , An) - Ey . . . , y, ylj+i, . . . , A„)])^ 

By Lemma [771 this is at most 

Od (Exi,...,x,_i,^,+i,...,A„[Vary(gi(Xi, . . . , Y, ^^-+1, . . . , An))]^) . 

Since the polynomial in expectation is at most quadratic in each Xj, this is 

Od (EA[Vary(%(Ai, . . . , A,_i, F, . . .,An))?) = Od(Inf,fe)2). 

Thus, 



m n 



i=l j=l 



i=i j=i 



<Od{Y^Y.^lnf,{q,) 

m 

^ T 

\i=l 
Od,m{T). 



Thus, by Equation p2|) . 

\Pv{p{A) < 0) - Pr(p(X) < 0)1 = Od,m{NT'^'log{T-Y"^/^+'] 

as desired. 



□ 



Proposition [35] is the main analytic tool used in our proof of Theorem [32l From it we can 
quickly derive the following theorem: 

Theorem 36. Let p be a degree-d multilinear polynomial with a (t, N,m,€) -regular decomposition 
(for 1/2 > e,T > 0) given by {h,qi, . . . , qm)- Let po{x) := h{qi{x), . . . , qm{x)). Let A be a Bernoulli 
random variable, X a Gaussian random variable, and t a real number. Then 

|Pr(p(yl) <t)-Pr(poW <t)| =Od,m(iVrV5log(r-i)'^-/2+i + eV^log(e-i)i/2) 
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Remark. For most applications Theorem [23 will be as good as Theorem as it shows that the 
regular polynomial of Bernoullis behaves similarly to a polynomial of Gaussians. As we shall see 
later, it will take some work to show that it will necessarily behave like the same polynomial of 
Gaussians. This is because although \p — pob.B is small, this does not immediately imply that 
\p ~ PoU is sufficiently small for the proof to work. 

Proof. As in the proof of Proposition [35l we may assume that t = and prove the inequahty 
Pt{p{A) < 0) < Pr(po(X) < 0) + 0,,„(iVri/5 log(r"i)'^™/2+i + ei/rfiog(e-i)i/2) 

By Corollary [26] we have with probability 1 — e that 



\p{A)-po{A)\ < 0(elog(6-i)'^/2)VVar(po(X)) < O{elogie-y/')\po\2. 
Thus, we have that 



Pr{p{A) < 0) < e + Pr(po(^) < 0{elog{e~Y^^)\Pi 



'0|2J 



< Od,,n(iVri/5 log(r-i)'^™/2+i +e) + Pr(po(X) < O{elog{e-y/')\po\2) 

< 0,,„(iVri/5 log(r-i)'^™/2+i +ei/^log(e-i)i/2) +Pr(po(X) < 0). 

The second line above is by Proposition 1351 and the third is by Lemma [2l 

The lower bound on Pi{p{A) < 0) is proved analogously. □ 

In order to complete the proof of Theorem [32] we need the following: 

Proposition 37. Ifp is a degree-d polynomial with a {T,N,m,e) -regular decomposition (for 1/2 > 
e, r > 0) given by po{x) = h{qi{x), . . . ,qm{x)), then for X a Gaussian random variable, and t a 
real number, 

\Pr{p{X) <t)- Pr{po{X) < t)\ < Orf,™ (r^/^iV log(T-^)'^(-+i)/2+i + e'/''log{e-'y/^) . 

The biggest difficulty with proving this Proposition will be dealing with the discrepancy between 
Po and L{pq). To deal with this, we make the following definition: 

Definition. Letpi,...,pk be multilinear polynomials. Define 



A{pu...,Pk)= (-1)"" (n^'O^fn^^* 

5C{1,2,...,A;} \ieS J \i<^S 



We note the following: 



Lemma 38. Let qi,...,qm be multilinear polynomials and let h be a degree-d polynomial in m 
variables then L{h{qi{x), . . . , qmix))) is 

k=Oii,...,ik=l ^ 
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Proof. As the above expression is linear in h, we may assume that /i is a monomial of degree d. In 
particular we may assume that h = qi^qi2 ' ' ' Qij, (note that some of the indices ij might coincide). 
The expression in question then becomes: 

T={ti,...,tk}C{l,...,d} \j^T J 

= E fn*.)E(-i)'^'(n*)ifn *1 

Tc{i,...,d} \j^T ) scT Vies / \ieT\s ) 

5CTC{i,...,d} \3iT\s / \jer\5 



Letting R = T\S, this is 




RC{i,...,d} \jeR ) \j^R ) 5e{i,...,d}\ii 

= E 

R={l,...,<i} \i^R 




as desired. □ 

To control the discrepancy between po ^ind L{pq) it now suffices to prove the following: 

Proposition 39. Let Pi, ■ ■ ■ ,Pk be multilinear, degree at most d polynomials with Infi{pj) < r for 
aUi,j, \pj\2 < 1 for all j. Then 

\A{pi,...,Pk)\2 = Ok,d{r''/^). 

Proof. We proceed by bounding the expected value of A{pi, . . . ,pk)^. In particular, we show that 
if pj are multilinear degree-d polynomials of norm at most 1 with all influences at most r then 

E[A{pi, . . . ,pk){X)A{pk+,, . . . ,p2k){X)] = OkAr"^^)- 

We note that the above expression is linear in the pj. We may therefore rewrite it as a sum over 
sequences of monomials mi , ■ ■ ■ , m2k where nij is a monomial of pj , of 

E[A(mi, . . . , mk){X)A{mk+i, ■ ■ .,m2k){X)]. 

To each such sequence of monomials mi , . . . , m2k we associate a repeat pattern, which is the mul- 
tiset of non-empty subsets of {1, 2, . . . , 2k} whose elements correspond to {j : Xi appears in monomial 
for all i so that Xi appears in any of the monomials mj. We break up the above sum into parts 
based on the repeat pattern satisfied by mi, . . . , m2fc, since there are Ofc,d(l) such possible patterns. 



39 



it suffices to prove our bound for the sum of all terms coming from each such pattern. It particular 
we need to show that for any repeat pattern P that 



E 



E{A(mi, .... mt}{X)A{mt+i,- ■ ■ , 'n2fc)(X)] = Ofc^r''^). (13) 



nrij a monomial from pj 
(mi,...,m2fc) has repeat pattern P 

Note that if the repeat pattern contains any subset of odd size that the resulting sum will be 
0. This is because for any mi, . . . ,m2k with this repeat pattern, there will be some Xi appearing 
in an odd number of the nij. This means that the product of the nij will be an odd function of x,. 
Since L of an odd polynomial is still odd, this means that A{mi, . . . , mk)A{'mk-\-i, ■ ■ ■ , m2k) will be 
an odd function of Xi and thus, have expectation 0. 

Furthermore, suppose that given P, there is some 1 < j < 2A; so that j does not appear in any 
element of P of size greater than 2. We claim again that for any mi, . . . ,m2k satisfying P that 
E[74(mi, . . . , mk){X)A(mk-\-i, • • • , m2fc)(X)] = 0. To show this we assume without loss of generality 
that j = 1. We expand out the j4's to get that the expression in question is the expectation of 

E E (-1)'*'^' f n f n p]4 n p, 

sc{i,2,...,fc}Tc{fc+i,...,2fc} \je5uT / \je{i,...,k}\s j \ie{fc+i,...,2fc}\T 

We claim that if we toggle whether 1 is in S" in the above sum, it has no effect on the expectation of 
the resulting product other than to negate the (— term. This is because adding 1 to 5 can 
only have the effect of removing some x\ terms from the resulting monomial. On the other hand 
since E[l] = E[X?], this does not effect the resulting expectation. Thus, the expectations of the 
terms with 1 in 5 cancel the expectations of the terms with 1 not in 5, leaving us with expectation 
0. 

It thus suffices to consider Equation (jl3p when all elements of P have even order and so that 
for each 1 < j < 2k there is some element of P of order at least 4 containing j. For such P we 
upper bound the left hand side of Equation (fT3|) by 



E 




Oi,,l nimib . (14) 



m,j a. monomial from pj 
(mi,...,m2fc) has repeat pattern P 

We will now prove the following statement, which will imply our desired bound. Let pi, ■ ■ ■ ,P2k 
be multilinear polynomials with \pj\ < 1 and T C {1, 2, . . . , 2k} some set so that Infipj < t for all 
i and all j E T. Furthermore, let P be a repeat pattern all of whose elements have even order and 
so that each element of T appears in some element of P of order at least 4, then the expression 
in Equation ()14p is at most Ofc,d(Tl^l/^). We prove this by induction on \P\. The base case where 
|P| = is trivial since then we are considering only the term where all of the mj are constants. 

If |P| > 0, we consider an element of P of maximal size. In particular, if T 7^ 0, this implies 
that this element is of size at least 4. Without loss of generality this element is {1, 2, . . . , 2£}. We 
break our sum into pieces based on which coordinate is shared by all of mi, . . . , m2£ (if more than 
one coordinate is shared by each of these elements we will count all of them leading to a strictly 
larger sum). If we wish to compute the sum over all terms where they share a coordinate Xi we 
find that it is 

rrtj a monomial from p'^ 
(mi ,...,m2k) has repeat pattern P' 
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Where above p'j = pj for j > 2i and for j < 21, p'- consists of the sum of the monomials in 
Pj containing Xi divided by Xj, and P' is P minus {1,2,... ,2^}. Furthermore note that |p^|2 = 
Y^Inf j (pj ) for j < U. Letting be the normahzed version of p^, the above is at most 



2i 



j=l mj a monomial from pj" \i=l 

(mi ,...,m2fc) has repeat pattern P' 

Letting T' = r\{l, 2, . . . , 2i}, we note that this sum is of the form specified for the value T', hence 
we have by the inductive hypothesis that the above sum is 

2e 



It thus suffices to prove that 

We assume without foss of generahty that T n {1, 2, . . . , 2i} = {1,2,..., a}. We note by Cauchy- 
Schwarz that 

2e /2e-2 \ / 2e ^ ^^'^ 

i j=l \j=l * / \j=2i^l i 

We note that for each of the last two terms that 

^lnf,(p,) = 0,(|p,|i) = 0,(l). 

i 

Furthermore, we have that 

2£-2 mm(a,2£-2) 2£-2 

j=l j=l j=a.+ l 

Thus, we have that 

21 



EU^^^^^iP^) ^ 0,(^min(a,2.-2)/2) ^ o^(^a/4)_ 
i j=l 

With the last step following from the observation that either a = or ^ > 2. This completes our 
inductive step and proves our proposition. 

□ 

We are now prepared to prove Proposition [37] and thus. Theorem 1321 

Proof. We may clearly assume that t = 0. We will give a series of high probability statements that 
together imply that 

sgn(p(X)) = sgn(po(^))- 
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Let V = Var(po)- 

First note that by assumption 

\p- L{pq)\2 = \p-Pq\2,b < (V. 

Thus, by Corollary [5] we have for some sufficiently large C that with probability 1 — e that 

\p(X)-L{p,){X)\<Ce\og{e-y/^V. 

Additionally, by LemmaEJ we have with probability 1 — 0((ie^/°' log(e~^)^/^) that 

\p^{X)\ > 2Celog{e-y/^\po\2 > 2Celog{e-y/^V. 

By Proposition [39] and Corollary [5] we have that for C a sufficiently large number given d that 
with probability 1 — Od^mi'T) that for all 1 < «i, • • • j ^fc <rn for k < d that 

|A(g,„...,g,J(X)|<C7T'=/^log(r-i)*/2. 

Finally, by Lemma[l8]we have that with probability l-Od,rn(T"^/^A^log(T"i)"'('"+^)/2+i-)^ letting 
X = . . .,qm{X)) that 

\h{x)\ > 3CmTV4log(r-i)'^/2|Ai/i(x)|2 > S^C^mV^/^ log(r'i)'^2/2|^^^^.^/^(^)|^ 

> ... > 3'^C7'^mV/4log(r-i)'^'/2|Ai ••• AXx)|2. 

Assuming that all of the above hold, then 

\p{X) - po{X)\ < \p{X) - L{po){X)\ + \po{X) - L{po){X)\ < \po{X)\/2 + \po{X) - L(po)(X)|. 

By Lemma [38l we have that letting x = {qi{X), . . . ,qm{X)) 

d m 

Yl A{qi,,...,qi^){X)Di,---Di^h{x] 



\L{po){X) - p^{X)\ 



k=l ii,...,ife=l 
d m 



k=l ii,...,ife=l 

d m 



<E E ^-'m-'\h{x)\ 

k=l ii,...,ik=l 
d 

<Y^-'MX)\ 

k=l 

< \Po{X)\/2. 
Combining this with the above we find that 

\p{X) -poiX)\ < \poiX)\/2 + \po{X)\/2 = \po{X)\. 
Thus, with probability at least 

1 - O,,™ (rV4Ariog(r-^)'^(™+i)/2+i + e'/^log{e-')'/') 
that sgn(p(X)) = sgn(po(-'^))- 



□ 
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6.3 The Regularity Lemma 



In this section, we will prove Theorem [3H Much of it will be along the lines of the proof of Theorem 
[1] with some extra work being done to ensure that the resulting are regular. We begin with a 
lemma on the regularity of restrictions of polynomials. 

Lemma 40. Let p he a degree-d multilinear polynomial with \p\2 < 1- Let 1/2 > e > be a real 
number. Then there exists an M = 0^(e~^ log(e~^)'^) so that for any set S of coordinates containing 
the M coordinates of highest influence for p, if we let A be a random Bernoulli variable over the 
coordinates in S and let pA be the polynomial over the remaining coordinates upon plugging these 
values into the coordinates of S then with probability 1 — e 

max(/n/j(j5yi)) < e. 

Proof. We assume throughout that e is sufficiently small. Note that the sum of the influences of 
p is Orf(l), therefore if M is a sufficiently large multiple of log(e~^)'^, we have that the largest 
influence of a coordinate not in S is at most a small constant times elog(e~^)~'^. Note that for each 
i ^ S, there is a polynomial pi of degree at most d so that Infj(pyi) = pi{A)'^. Furthermore, it is 
easy to check that E[pj(^)^] = Infj(p). Applying Corollary [5] we find that if M were chosen to be 
sufficiently large, then with probability at most is any given lnii{pA) more than e. Taking a 
union bound over i, we find that with probability at most e/2 is some Infj(pyi) > e for any i with 
Infj(p) > de^. Consider the polynomial 

q{A) = Yl ^^^jiPA?- 

i:Inf,(p)<a!e3 

Note that |Infj(pA)^|2 = Od{ln{j{pf) by Lemma El Thus, 

k|2<0,(l) Yl Inf,(p)' < Od(e') E Inf,(p) = Od(e'). 

j:lnij{p)<de'i j-lnij{p)<de'i 

Thus, by Corollary [5] > with probability at most e/2. On the other hand, if q{A) < e^, it 
implies that Infj(p^) < e for all j so that Infj(p) < de^. Thus, with probability at most e is any 
Infj{pA_) more than e. □ 

Lemma 41. Let p be a degree-d multilinear polynomial. Let S be a set of coordinates and A a 
Bernoulli random variable over those coordinates. Let pA be the restricted polynomial when the 
coordinates of A are plugged into p. Then 

Pr{\pA\2>N\p\2) = Od (2'°sW'). 

Proof. Note that \pA\2 is a polynomial in A of degree at most 2d. Note that the squared L2 norm 
of this polynomial is 

W.A[^B[p{A,Bff] < EA,B[p(Ai?)'] = \p\iB < Odi\p\i)- 
The result now follows from Corollarv I26[ □ 
The main parts of the proof of Theorem [M] are contained in the following proposition 
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Proposition 42. Let p he a degree-d multilinear polynomial and let e, c, M > for 1/2 > e. Then 
p can be written as a decision tree of depth 



Oc,d,M{e ^log(e 




with coordinate variables for internal nodes and polynomials for leaves so that for a random leaf pp 
we have with probability 1 — Oc^d,M{^) that there exists a po with \p — po\2,B < e^lpb and so that pQ 
has an {€,e~'^)- diffuse decomposition {h,qi, . . . ,qm) with m = Oc,d,Af(l); Qi multilinear and so that 
InfjiQi) < e for each ij. 

Proof. The proof is along the same hnes as the proof of Theorem [H with some extra work done to 
ensure that the influences can be controlled. We assume that |p|2 = 1 and assume throughout that 
e is sufficiently small. 

We define: a partial decomposition of our polynomial |j to be a set of the following data: 

• A positive integer m. 

• A polynomial /i : M. 

• A sequence of multilinear polynomials {qi, . . . ,qm) each on M" with \qi\2 = 1 for each i. 

• A sequence of integers (ai, . . . , a^) with a, between and 4 • 3*(A^ + l)/c — 1. 

Furthermore, we require that each q^ is non-constant, and that for any monomial Y\ x'^^ appearing 
in h that ^ deg(gi) < d. 

We say that such a partial decomposition has complexity at most C if the following hold: 



We define the weight of a partial decomposition as follows. First we define the polynomial 



We then let the weight of the decomposition be w{uj). 

We prove by ordinal induction on w that if p has a partial decomposition of weight w and 
complexity C, then there is a decision tree of depth 0c^c4,N,w{^~^ log(e^^)'^('^)) so that with prob- 
ability 1 — Ocfi^d,w,N{() a random leaf has such a pQ with a diffuse decomposition into multilinear 
polynomials whose influences are at most e. 

Again the idea of the proof is to show that after a decision tree of appropriate depth and with 
appropriate probability, that we either have such a pq or that we have a partial decomposition with 
smaller weight. By Lemma HOj if we restrict to random values of the Od(e~^ log(e~^)'-^('^)) highest 
influence coordinates of each of the qi, we will have all influences of all of the qi at most e with 
probability 1 — Od,m{^)- Applying Lemma HH to the qi and p — h{qi, . . . , qm) we flnd that with 
probability 1 — Od^mi^) that the restricted values of qi have norm at most log(e~^)*^''^^ and that 
the L?' norm of p — h{qi, . . . , q^) increased by at most a similar factor. Thus, rescaling the qi and 
modifying h appropriately, we flnd that with probability 1 — Oa^mi^) over our restrictions, we have 
a new partial decomposition of weight w and complexity Oc(l) so that Infj(gj) < e for each i and 



• m < C. 



• \h\2 < Ce-^+^"\ 

. \p{A) - hie''^'/^'-''\i{A))\2,B < Ce^+Mog(e-i) 



m 




i=l 
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j. As in Lemma [T6l we show that either {h,qi, . . . , q^) is an (e, e~^)-difFuse set or that we have a 
partial decomposition of strictly smaller weight and with complexity Oc,c,d,Af(l)- The proof follows 
through identically to the proof in Lemma [16] with the additional caveat that the Ai,Bi can be 
chosen to be multilinear. This is because the qi are multilinear, so keeping only the multilinear 
parts of the A^, Bi only reduces the error produced by the approximation. This completes the 
proof. 

□ 

We will need one more lemma about decision trees of polynomials before we proceed. 

Lemma 43. Let p be a multilinear, degree-d polynomial. Let T be some decision tree over it's 
coordinates. LfT is evaluated making random, independent choices at each step, and the restricted 
function is called pp, then with probability at least 2^^'^^ over these choices we have that 

\Pp\2 > \p\2/2. 

Proof. Given a partially filled-in decision tree T' define V{T') = K[p(A)'^\T']. It is clear that is a 
martingale. Therefore V'^ is a submartingale. In particular, this means that the expectation of V"^ 
over some decision tree is at most the expectation over an extended decision tree that eventually 
decides values for all coordinates. This latter expectation is IpI^^ = 2'^('^)|p|2. Therefore, the 
expectation over fills of T of ^ is ||?|| and the expectation of V"^ is at most 2'^('^^|p|2. Therefore by 
the Paley-Zygmund inequality with probability at least 2^^'^^ we have that V > |p|2/4, proving our 
lemma. □ 

We are now prepared to prove Theorem 1341 

Proof. We claim that for r sufficiently small that a correctly constructed decision tree of depth 
Oc,d,Af(''"^^ log(T^^)'^('^)) yields a restriction with the desired property with probability at least 
20(d)_ Repeating this process up to 2*-^^'^) log(r~^) many times upon failure will guarantee an 
aggregate success probability of 1 — r. 

To do this we construct the decision tree given by Proposition [32] for = M + d + 2 and 
e = r. We claim that if the restricted polynomial has norm at least |p|2/2 (which happens with 
probability 2'^^'^^ by Lemma H3]) . then the resulting polynomial has the desired property. 

Let P be the resulting polynomial. We have a polynomial po with an appropriate diffuse de- 
composition into multilinear polynomials with sufficiently small influences and so that \P —po\2,B = 
Oc,d,M{T^'^~^'^~^'^)\P\2- If Var(po) > t'^^~^'^\P\2^ we have an appropriate regular decomposition. Oth- 
erwise, Var(po) < T*^"'"'^|P||. This implies that for some fi that \pQ — < r*^"'"'^|P||. Thus, by 
Lemma [20] we have that \h — ijl\2 < Oc.d.M (''"''^) I -Pli- From this it is easy to see that the sum of the 
squares of the coefficients of h — 1^ is Oc,d,Mir^^)\P\l- From this it is easy to verify that the variance 
of po over Bernoulli inputs is Oc^d,M{T^)\P\2- Therefore, due to the small difference between p 
and Po under Bernoulli inputs, we have that Var(P) < Oc4^m{t'^^)\P\2, which satisfies one of the 
necessary conditions. 

□ 
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7 Application to Noise Sensitivity of Polynomial Threshold Func- 
tions 

7.1 Background of Noise Sensitivity Results 
7.1.1 Definitions 

If / : M" { — 1,1} is a boolean function, the noise sensitivity of / is a measure of the hkchhood 
that a small change in the input value to / changes the output. There are several different notions 
of noise sensitivity, suitable for slightly different contexts. We present their definitions here. 

Definition. For f : M" — )■ { — 1, 1} a boolean function, we define its average sensitivity to be 

AS(/) :=J2PrA^A-iA}-if(^) ^ /(^^'^))' 

i=l 

where A^'^^ is obtained from A by flipping the sign of the i^^ coordinate. In other words, the average 
sensitivity is the expected number of coordinates of A that could be changed in order to change the 
value of f. 

We also define the average sensitivity in the Gaussian setting: 

Definition. For / : ^ {—1,1} a boolean function, we define its Gaussian average sensitivity 
to be 

n 

GAS(/) := J]Pr(/(X)7^/(X»)), 

i=l 

where above X is a Gaussian random variable and X^^^ is obtained from X by replacing the i^^ 
coordinate by an independent random Gaussian. 

A related notion is that of noise sensitivity in the Bernoulli or Gaussian context. Whereas 
average sensitivity counts the expected number of coordinates that could be changed to alter the 
sign of /, noise sensitivity measures the probability that the sign of / changes if each coordinate is 
changed by a small amount. In particular we define: 

Definition. For / : M" — > {—1, 1} a boolean function, and 1 > S > we define the noise sensitivity 
of / with parameter S to be 

msif) := Pr{f{A) + f{B)), 

where A and B are Bernoulli random variables with B obtained from A by flipping the sign of each 
coordinate randomly and independently with probability 5. 

Definition. For f : M" — { — 1, 1} a boolean function, and 1 > S >0 we define the Gaussian noise 
sensitivity of / with parameter 6 to be 

GNSsif) := Pr{f{X) ^ f{Y)), 

where X and Y are Gaussian random variables that together form a joint Gaussian with 

CoviX.,Y,) = l^' - '^'=' . 

otherwise 
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7.1.2 Previous Work 



The main Conjecture about the noise sensitivity of polynomial threshold functions was given in [8] 

Conjecture 44 (Gotsman-Linial). Let f be a degree-d polynomial threshold function in n variables, 
then 

^S(/)<2-«E(^(„_"„/,j)(n-L(n-^/2J). 

Remark. It should be noted that the upper bound conjectured above is actually obtainable. In 
particular, if f is the polynomial threshold function associated to the polynomial 

-d + 2i-l/2 

achieves this bound. 

In particular, Conjecture SH implies that 

AS(/) = 0{dV^). 




By the work of |10j . this implies bounds on other notions of sensitivity. In particular it would imply 
that 

msif) = 0{dV6) 

and 

Gmsif) = 0{dV6). 

Furthermore, this would imply the following bound on the Gaussian average sensitivity 

GAS(/) = 0{d./n). 

In particular, we have: 

Lemma 45. The largest Gaussian average sensitivity of any degree-d polynomial threshold function 
in n variables is at most the largest average sensitivity of a degree-d polynomial threshold function 
in n variables. 

Proof. We will show that if / is a degree-d PTF in n variables, then GA§(/) can be written as an 
expectation over the average sensitivities of certain other degree-d PTFs in n variables. The key to 
this argument is to produce the correct distribution on pairs of Gaussians that differ in exactly one 
coordinate in an unusual way. In particular, we define n-variable Gaussians Z and Z' as follows: 

Zi = l={Xi + AiYi), Z[ = + BiY,) 

where Xj, Yi are independent Gaussian random variables, and A = (^i, . . . , i? = {B\^ . . . , Bn) 
are Bernoulli random variables that differ only in a single random coordinate. It is clear that Z 
and Z' are random Gaussians that agree in all but one of their coordinates, and that they are 
independent in the coordinate on which they differ. Thus, 

GA§(/) = Pr(/(Z) / fiZ')). 
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On the other hand, after fixing values of X and y, we may define a new degree-d PTF fx,Y by 
Therefore, we have that 

GAS(/) = Pr(/(Z) / f{Z')) 

= Ex,y [Pr(/x,y(A) / /x,y(i3))] 
= Ex,y[AS(/x,y)]. 

This is at most the maximum possible average sensitivity of a degree-d PTF in n variables. □ 

Proving the conjectured bounds for the various notions of sensitivity has proved to be quite 
difficult. The degree-1 case of Conjecture [44] was known to Gotsman and Linial. The first non- 
trivial bounds for higher degrees were obtained independently by [lOj and 0, who later combined 
their papers into |4j. They essentially proved bounds on average sensitivities of Oci{n^~^^^^'^^) and 
bounds on noise sensitivities of Orf(5^/'^('^'*). For the special case of Gaussian noise sensitivity, the 
author proved essentially optimal bounds in [12] of 0{d^f5). In this section, we improve on these 
bounds and in particular show that AS(/) = 0c4{n^^^^^)- Our basic technique will be to compare 
NS5(/) to GNS25(/) using an appropriate invariance principle. It should be noted that this idea 
could have been applied using traditional means, but that the bound obtained would not have been 
better than 5^-0(i/d)_ 

7.2 Noise Sensitivity Bounds 

In this section, we prove the following three theorems: 

Theorem 46. If f is a degree-d polynomial threshold function, and ifc,6 > 0, then 

msif) = Oc,d((^'/'-'). 

Theorem 47. // / is a degree-d polynomial threshold function in n variables, and if c > 0, then 

A8{f) = Oc,d(n^/6+^). 

Theorem 48. For f a degree-d polynomial threshold function in n variables and c > 0, 

GAS(/) = Oe,d(n^/^+^). 
We begin with the proof of Theorem |36] in the case of regular polynomial threshold function. 

Proposition 49. Let f = sgn o p be a polynomial threshold function for p a degree-d polynomial 
with a (r, N, m, e)-regular decomposition for 1/2 > e, r > 0. Let 1 > (5 > 0, then 

msif) = 0{dV6) + 0{de'/^''\og{e-')) + Od,m{NT'/' log{T-y"'/^+'). 

The proof of Proposition 09] will be to use the replacement method to show that f^Ssif) is 
approximately GNE>2s{f), which we bound using the main theorem of [12j. Unfortunately, we will 
not be able to apply Proposition [2T] directly, but many of the techniques will be similar. 
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Proof. Let A^jA^ be a pair of Bernoulli random variables so that for each coordinate i, Aj and Af 
are equal with probability 1 — 6 independently over different i. NSs{f) = Pr(/(A"'^) ^ /{A^)) = 
2Pr(/(Ai) = l,f{A'^) = -1). We wish to bound this later probability. 

Let and X'^ be Gaussian random variables so that the joint distribution is a 

Gaussian with 

Cov{Xl,Xl) = { ■' 

I Otherwise 

Note that all of the first three moments of {A'^^A^) are identical to the corresponding moments of 
{X\X^). 

We are given that there exists a polynomial po with \p — Polis ^ £Var(po) so that po has a 
(r, A^)-diffuse decomposition (h,qi, . . . ,qm) with multilinear and Infj(gj) < r for all After 
rescaling these polynomials, we may assume that Var(po) < \P0\2 — 1- Note that by Corollary 
[26] that with probability 1 - 0{e) that \p{A') - po{A')\ < e^/"^ log{e~'^)'^ for each of i = 1,2. By 
Proposition [17] there exist functions f^,f'^ : — )• [0, 1] so that: 

• f^{x) = 1 if h{x) + log(l + £-1)'^ > 0. 

• f(x) = 1 if h{x) - log(l + e-'^Y < 0- 
• 

E[f\q,iX'), qmiX'))] - E[I^o,oo)iHqiiX'), q^iX')) + e^^ log(l + e-^)] 

= Od,™(iVrV5log(r-i)'^-/2+i). 

E[/2(gi(x2), . . . , - E[/(_^,o)(/ite(^'), • • • , qm{X')) " e'^' log(l + e-y)] 

• |(f )(^^|oo = ©^(r-'^/S) for 1 < A; < 4. 
We then have that 



msif) = 2Pr(/(Ai) = 1J{A^) = -1) 

< E[/i((?i(Ai), . . . , q^{A^))f{qi{A% qm{A'))] + 0(e). 

We would like to relate 

E[f\q,{A'), q^iA'))f\q^{A% q^A^))] 

to 

E[/l(gi(Xl), . . . , qm{X^))f{q,{X% qrr,{X'))]. 

In particular, we have that with respect to the Gaussian distribution, f^{qi{X), . . . , qm{X)) differs 
from I(o,oo)(±(Po(^) -e^/^log(e-i)'=')) with probability at most Od,„(iVrV5 log(r-i)^™/2+i). This 
in turn differs from /(o,oo)(='=Po(^)) with probability at most 0{de^^'^^ log(e~^)) by Lemma[2] Hence 
we have that 
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= 0{de'/^''log{e-')) + Od,,n{NT'/''log{T~Y^/^+') + Gm25{f) 
= 0(dei/2'ilog(e-i)) + Od,„(iVri/5 log(r-i)'^'"/2+i) ^ 0{dV^). 

Where the bound on the Gaussian noise sensitivity comes from the main Theorem of [12]. 

Thus, we are left with the task of bounding the difference between K[f^{qi{A^))f'^{qi{A'^))] and 
^[f^{Qi{X^))f'^{Qi{X'^))]- We do this with the replacement method. We let Z*'^ be the random 
vector whose j^^ component is Aj if j > £ and Xj otherwise. We note that Z*'" = and Z*'" = X*. 
We proceed to bound the difference 

|E[/ife(Zi'^--i))/2(g,(z2'^--i))]-E[/i(g,(Zi'^-))/2(g,(z2'^-))]|. (15) 

We note that Z''^'^ and Z''^ agree in all but the j*^ coordinate. Thus, in bounding the difference 
above we may consider all but the j^^ coordinate fixed. We then approximate the resulting function 
of Z],Z| by it's Taylor series. In particular, if we let Zi = Zj, then for appropriate functions gi 
and 52 (depending on the other coordinates of Z) we need to consider ^[gi{zi)g2{z2)]- We have 
that 51(2:1)92 (-22) equals a degree 3 polynomial in zi and 22 plus an error of at most 

zfgT'itM0)/24 + 2?229'i"(i2)52(i3)/6 + zjzyi{U)gUh)/A 
+ zizlg[{te)g'^'{tr)/6 + zlgi{0)g^" {ts)/24 

for some points tj. Since the expectations of the degree 3 polynomials in 21 and 22 are the same 
in the Bernoulli and Gaussian case, and since the fourth moments are bounded, we have that the 
difference in Equation ()15p is 



O (E [|5i"loo + \g'i92\oo + \g1g2\00 + \gT92\oo + Isrioo]) 



Now the k^^ derivative of gi can be written as 



E 



gkji 



^ dqi^ ■■■dq.. 



-n 



dqi,{Z' 
dzi 



On the other hand, by assumption, this partial derivative of /* is at most r and the product 
is at most 

f dq,' ' 
max — — 

\ i OXj 



Thus, the total error in Equation (115p is at most 



O mV-^/^E 



EE 



.£=1 i=l 



It is clear that 



E 



( dqejZ' 
V dzj 



(dqdZ^ 
V dzj 

= Infj(g^ 
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Thus, ^'^g'f. ^ is a polynomial in independent Bernoulli and Gaussian random variables with second 

moment lnij{q£). Therefore by Lemma [27] its fourth moment is Od(Infj(g^)^). Therefore we have 
that the expression in Equation (jl5p is at most 

Therefore, summing this over j, we get that 

\E[fHq,iA'))fiq^iA'))] - E[f\q,iX'))fiq,iX^))]\ 

is at most 

Od,m [T-^/'Y.^nf]iq, 

\ 3/ 

On the other hand, for fixed I we have that \n.ij{qi) = 0(i(l) and that for each j that ln.{j{q() < r. 
Therefore, X^j I^ifj (^^) = ^^(r). Thus, we have that 

\E[f\qM'))fHq^iA'))]-nf\Q^iX'))f\q^iX^))]\=0,,rnir'^^)■ 

Recall though that 

ms<E[fHq,{A'))f{q^{A'))] + 0{e) 

and that 

E[f\q,{X'))f{q,{X'))] = 0(^6^2'^ log(l + e"!)) + 0,,„(iVrV5 log(r-i)'^™/2+i) + Q^^^^y 
Combining these yields our result. 

□ 

We are now prepared to prove Theorem [ 



Proof. Write / = sgnop for p a degree-d polynomial. We will reduce to the case of Proposition 1491 by 
use of Theorem 1341 In particular, we may write p as a decision tree of depth Oc^d{S~^^^ log{6~^)'^^'^^) 
so that a 1—5^^^ fraction of the leaves are polynomials with either a {5^^^ , 5~^^'^ , Oc,cz(l)i ^^'^)-i"egular 
decomposition or with variance less than 5"^ times their squared mean. 

Consider and random Bernoulli variables that differ in each coordinate independently 
with probability 6. Consider the path on the decision tree above followed by A^. With probability 
at least 1 — 6^^^ the resulting leaf satisfies one of the two cases specified by Theorem [3H Fur- 
thermore, with probability at least 1 — O^di^^^^ log(5~^)*^*-'^^) A"^ agrees with A^ on all coordinates 
queried by the decision tree. Conditioned on this occurrence, the probability that p{A^) and p(A^) 
have different signs is equal to the noise sensitivity with parameter 6 of the polynomial threshold 
function defined by the leaf. If the leaf has a ((5^/^, (J"'^''^, Oc,rf(l), (5'^'^)-regular decomposition, this 
is Oc,diS^^^~'^) by Proposition 1491 If this polynomial has low variance compared to its mean, then 
both p{A^) and p{A'^) are the same sign as the mean of p with high probability by Corollarv 1261 
Thus, we have that 

msif) < 6'/^ + OcA^'/^ iog(ri)«('^)) + OcA^'/^'l = OcA^'/^^'l- 

□ 

Theorem 1471 now follows immediately by Lemma 8.1 of [lOJ. And Theorem 1481 follows from 
Theorem 1471 and Lemma [ 
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8 Application to PRGs for PTFs with Bernoulli Inputs 



In [16j . Meka and Zuckerman developed a relatively small pseudo-random generator of polynomial 
threshold functions with Bernoulli inputs. Their generator was defined as follows. Let /i : [n] — t- [a] 
be a hash function picked from a 2-independent family. Let A^, . . . , A"" : [n] ^ {~^: 1} be chosen 
independently from a /c-independent hash family. Meka and Zuckerman's generator is given by 
Ai = A^^^\ Meka and Zuckerman show that for appropriate chosen m = 0(e~^) and a = 0{e~'-^^'^^) 
that this generator fools all degree-d polynomial threshold functions to within e. 

Meka and Zuckerman's proof is essentially to think of h as constant and to use the replacement 
method to bound the expected errors as the A^ are replaced by random Gaussians vectors one at a 
time. If the polynomial in question is sufficiently regular, then these errors will be small, and thus, 
the expected value of the PTF in question over the PRG will be close to the expected value of the 
PTF over random Gaussian inputs, and by the Invariance Principle, the expected value at random 
Bernoulli inputs will also be close. Unfortunately, this technique had been limited by the classical 
Invariance Principle and Regularity Lemma, and thus, could not produce a PRG of seed length 
less than e~^^^\ In this section, we will show how our Diffuse Invariance Principle and Regularity 
Lemma can improve this to produce a PRG of seed length Od{log{n)e~'-'^^^) . 

We begin by producing a pseudorandom generator that works in the case of regular polynomials, 
and then reducing the general case to this one. 

8.1 The Regular Case 

Proposition 50. Let p be a degree-d polynomial in n variables with a {T,N,m,e) -regular decom- 
position. Let a be a positive integer. Let /i : [n] — ?• [a] be picked randomly from a 2-independent 
hash family and for each h let A^, . . . , yl" : [n] — )• {—1, 1} be picked indecently from Ad-independent 
hash families. Define the n-variable function A in terms of h and A^ as Ai = A^^^^ . Then if B is 
a Bernoulli random variable, \&[sgn{p{A))] — ¥.[sgn{p[B))]\ is at most 

Od,m{^T^'^ log(l + r-i)^'"/2+i) + 0((iei/"! log(e-i) 1/2) + 0{a-^T-^). 
We begin by showing that a similar statement holds for an appropriate choice of h. 

Lemma 51. Let p and pQ be degree-d polynomials with \p — PqW ^ < e^Var(po) so that p^ has a 
{t, N)-diffuse decomposition {g,qi, . . . ,qm) with qi multilinear (1/2 > e, r > 0). Suppose further- 
more that /i : [n] — )• [a] is a function so that 

j=i \t.h(t)=j i J 

Let A^, . . . , A" : [n] — ?• { — 1, 1} be picked independently from a Ad-independent hash family. Define 
the random variable A so that its i^^ coordinate is the i^^ coordinate of A^^'^\ Then for G a random 
Gaussian we have that 

nsgn{p{A))] - E[s5n(po(G))]| < OdA^r^"' log(r-i)^-/2+i) + 0{de^/^<'). 
Proof. We show that 

Vi{p{A) < 0) < Pr(po(G) < 0) + OdA^T^"' \og{T-y^'^+^) + 0{de^'^''). 
The other direction will follow analogously. 
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First, we note that by Corollary [26] that with probability 1 — 0(e) that \p{A) — pq{A)\ < 
ei/2y/Var(po) < e^^'^lPoU- Therefore, 

Pr{piA) < 0) < PripoiA) < -e'/^\po\^) + Oie). 

On the other hand, 

Pr(po(G) < -e'/^\po\2) = Pr(po(G) < 0) + 0{de'/''') 

by Lemma [21 Hence it will suffice to prove that 

Pr(po(^) < -e'/'iPob) < Pr(po(G) < -e'^'lpoh) + Od,m{NT'^' log{T-y"'/^+'). 

Modifying pQ by e^^^lpob) it suffices to prove under the same hypothesis that 

Pr(po(^) < 0) < Pr(po(G) < 0) + Od,m{Nr'/' log(r-i)'^-/2+i). 

The proof is by Proposition [2TJ Let be the vector of entries Aj of A^ for which h{j) = i. 
Reordering, the coordinate variables we can make it so that A = {B^ , . . . , B""). Similarly, let 
G = (G^, . . . , G"). Note that since the qi are multilinear and degree at most d, that any degree-3 
polynomial in the qi has the same expectation under the B^ as under the G*. We may thus apply 
Proposition [21] with k = 4. We have that 



\Pt{po{A) < 0) - Pr(^,o(G) < 0)| = Od,„(iVri/5 logir-^^/^-^' + t-^/'T). 
Recall that T above is 



(16) 



where Tjj is 



E 
+E 



{q^{B\ 



, B^, G^+\ . . . , G") - Ey . . . , B^-\Y, 

, B^-\ G^ . . . , G'') - ^Y[q^[B\ B^-\Y, G^+\ 



Od 
+Od fE 



By the 4d- independence of the i?*, this expectation is the same as it would be if they were fully 
independent Bernoulli variables. Thus, by Lemma [27l this is at most 

{q,{B\ B\ ...,G-)- EyMB\ B^-\Y, G-)]f 

{q,{B\...,B^-\&,...,G'^)-EYhiB\...,B^-\Y,G^+\...,Gn]f 

Since the terms in the expectations above are at most quadratic in any coordinate, the expectation 
is unchanged by replacing Gaussian inputs with Bernoullis and hence 



Tij = Od (e^i_ _^jVarB, fe(S))]2) . 



The variance above is clearly the sum of the squares of the coefficients of the non-constant terms of 
the polynomial obtained by substituting the values oi B^ , . . . , B^ , . . . , i?" into Qi. The expectation 



53 



of this is easily seen to be the sum of the squares of the coefficients of the monomials in qi containing 
at least one of the variables. This in turn is clearly at most Yliih{i)=j^'^^ti^i)- Thus, 

m a 

i=i j=i 

ma/ \ 

i=l j=l \£:h{e)=j 
a / \ ^ 

j = l \£:h(£)=j i 
< T. 

Thus, by Equation HfTU^i . 

|Pr(po(A) < 0) - Pr(po(G) < 0)| = Orf,^(iVTV5 log(T-i)^-/2+i), 

completing our proof. □ 

We can now prove Proposition [501 

Proof. Let gi, . . . , be as given in the (r. A'', m, e)-regular decomposition of p. 

By the above Lemma, it suffices to prove that with probability 1 — 0{a^^T^^) over h that 

El E T.^^UH)] =OaAr)- 

3 = 1 \i:h{i)=i (■ j 

On the other hand this is at most 

mEE^"^*(^^)^ + "^E E Infi(g^)Infi/(g^). 

I i I i^i':h{i)=h{i') 

Since ^^i^i^hiQt) = 0^(1) for each i and since each Infj(gf) is at most r, the first term above is 
Od,m{'T)- The expectation of the latter term above is 

i i^i' \ I \ i J 



mi 



= Od,mia ). 

Our result follows from the Markov bound on this random variable. □ 
8.2 The General Case 

We are now prepared to state our conclusions in the general case. 
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Theorem 52. Let A he a random variable defined as follows. Let /i : [n] — )• [a] he picked randomly 
from a 2-independent hash family for a = . Let , . . . , A"- : [n] ^ {— li 1} be picked indepen- 
dently from k-independent hash families for k = + 4d. Let Ai = J^^^^ for 1 < i < n. Note 
that A can he generated from a seed of length 0{log{n)e~^^). Let B he a random n-dimensional 
Bernoulli random variable, and let f he any degree-d polynomial threshold function in n variables. 
Then for any c > 

|E[/(A)]-E[/(i?)]|=0,,rf(ei-^). 

Remark. Note that by changing the values of a and k above we can find a PRG with seed length 
O c,d{^og{n)e~^^~'^) that fools degree-d PTFs to within e. 

Proof. Note that the coordinates of A are /c-independent (since they are for each possible value of 
h). Assume that e is sufficiently small (since otherwise there is nothing to prove). By Theorem [34l 
we know that / can be written as a decision tree of depth so that with probability 1 — 0(e) 
a randomly chosen leaf is of the form sgn o p where either Yai{p{B)) < e'^\K\p{B)]\ or p has an 
(e^, e~^/^, Oc.di^), e^'^)-regular decomposition. For each such decision-tree path, condition on A and 
B on having the appropriate values on the appropriate coordinates defining this branch of the 
decision tree. Note that the conditional distribution on A can be written in the same form as A 
was originally written only with the A^ perhaps only being 4(i-independent. 

There is a probability of 1 — 0(e) that p satisfies one of the two conditions outlined above. If 
the former condition holds, both p{A) and p{B) have the same sign as E[|)(i?)] with probability 
1 — 0(e). In the latter case, by Proposition [50l we have that for an appropriate po 

E[sgn(p(^))] = E[sgn(po(G))] + Od,mie'-'') = E[sgn(p(i?))] + Orf,„(e^-^) 

(since B is also of the form specified in Proposition I50p . This completes our proof. □ 

9 Conclusion 

We have introduced the notion of a diffuse decomposition of a polynomial and proved that they 
exist for reasonable parameters. This in turn has allowed us to make improvements on known 
bounds for several major problems relating to polynomial threshold functions. There are several 
directions in which this work might be expanded. Perhaps most importantly is that the theory 
introduced in this paper may well have applications to other problems of interest in the field. On 
the other hand. Theorem [T] still has room for improvement. In particular, I believe that such a 
diffuse decomposition should exist with size merely polynomial in dN/c. Producing such a technical 
improvement, would allow one to noticeably improve the d-dependence in all of the applications 
presented in this paper. 
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